Migrating to DSpace

19/12/2017

Are your considering to migrate your current repository to DSpace but you are wondering what the options are? It might be easier than you think.

Are your considering to migrate your current repository to DSpace but you are wondering what the options are? Over the years, Atmire has carried out migrations from a variety of platforms such as EPrints, Fedora, Digital Commons, CONTENTdm, DigiTool, Equella and homegrown software. Because of its wide array of commonly used import facilities and formats, migrating to DSpace may require less work and introduce less risk than you may think.

By exporting or transforming your existing content into one of the formats below, you can leverage standard DSpace facilities for imports.

Archival Information Package (AIP)

If your current system has METS, MODS or PREMIS export facilities, the DSpace AIP format can be a good fit for your exports and for imports into DSpace.

Capable of containing both metadata and bitstreams (assets), AIP packages can be constructed in different levels of granularity, going as large as an entire collection, or as granular as a single bitstream.

Furthermore, it is one of the only import formats that allows you to specify authorizations on specific objects, which is key if the content you are migrating is not entirely open access.

More information:

https://wiki.duraspace.org/display/DSDOC6x/AIP+Backup+and+Restore

https://wiki.duraspace.org/display/DSDOC6x/DSpace+AIP+Format

https://wiki.duraspace.org/display/DSDOC6x/Importing+and+Exporting+Content+via+Packages

Simple Archive Format (SAF)

If you are looking for a more lightweight approach, but XML does not scare you, the Simple Archive Format (SAF) might be what you are looking form.

Also capable of ingesting both metadata and assets, the granularity of a SAF based import is that you are importing one or more items. An item in DSpace is a metadata record with 0 or more assets attached to it.

The most important limitations of the SAF import are that it is agnostic to authorizations and that a single export or import operation is not suitable for thousands of items in one go.

More information:

https://wiki.duraspace.org/display/DSDOC6x/Importi...

Metadata in bibliographic formats and spreadsheets

If the scope of your migration is limited to metadata and does not include any actual assets, a wide array of structured formats can be imported in batch, in order to produce items without any bitstreams attached to them, including:

  • Endnote
  • BibTex
  • RIS
  • TSV
  • CSV
  • OAI
  • arXiv
  • PubMed
  • CrossRef
  • CiNii

Once ingested into DSpace, you can either have these items pass through a manual reviewing and enrichment process, or just attach bitstreams and set authorizations later.

More information:

https://wiki.duraspace.org/pages/viewpage.action?p...

https://wiki.duraspace.org/display/DSDOC6x/Batch+M...

OAI-PMH and OAI-ORE

The previous import formats enable you to create batch exports in a particular format, and use command line scripts on your new DSpace server to ingest these exports.

If your current repository has an OAI-PMH or OAI-ORE endpoint, you might be able to skip these steps and the associated work, and harvest your content directly from your existing repository, into your new DSpace.

From the DSpace administrator user interface that allows you to manage collections, you can hook up a particular collection to an external OAI-PMH or OAI-ORE endpoint, e.g. your old repository. Once configured, DSpace will pick up the items from your old repository and create new ones in DSpace.

OAI-PMH can be used in case you only aim to harvest metadata, while OAI-ORE also gives you the opportunity to bring files along.

Very likely, your (public) OAI-PMH and OAI-ORE endpoints are not configured to allow the dissemination of restricted content. This means additional configuration on your old repository might be required to harvest restricted content with this method.

Keeping your repository URLs alive

Your old repository service may have promised or even guaranteed persistence of the URLs it has assigned to your existing metadata records and assets. There are different ways to keep this promise, even if the old repository is entirely discontinued.

If your old repository supported and has issued handle.net urls, you're in luck. In dialogue with CNRI that manages the handle.net infrastructure, you can hand over the resolution of your existing handle.net prefix to your new DSpace repository.

However, regardless whether you have been using handle.net urls or not, search engines and normal users may also have been referring to your repository content with the hostname and URL under which the old repository was hosted.

Different redirect and resolution strategies can be applied so that at the very minimum, users accessing an old URL will land on the homepage of the new repository and do not hit a generic "404 - Page Not Found" message.

Working with a Duraspace Registered Service provider

If the provided information still leaves you at a loss on where to start, or if you simply don't have the time, you can always engage a Duraspace Registered Service provider like Atmire.

Regardless of what the structured or unstructured format is that your current system can produce, our engineers can assist in the analysis, the transformation process, the imports and finally, rolling out your new DSpace repository.

We have done this on various scales. From a limited amount of items to +500.000 items and from a few GigaBytes to multiple TeraBytes of content, in a wide array of source formats.

Contact us at info@atmire.com to discuss the challenges and the goals of your migration.