Impact of Plan S implementation guidelines on DSpace repositories

17/12/2018

Atmire reflections on the requirements for Plan S compliant Open Access repositories.

Coalition S recently released guidance on the implementation of Plan S, an initiative by European funders to make full and immediate Open Access a reality by 2020. Very specifically, this guidance contains section 10.2 - Requirements for Plan S compliant Open Access repositories.

The Confederation of Open Access Repositories (COAR) has issued an initial response to the guidance. In this article, Atmire analyses each of the requirements, the COAR feedback and provides a perspective on what the implications might be for DSpace repositories.

Automated manuscript ingestion facility

For metadata or full text files of published works, that are already available in a machine readable format on the net, there is a great opportunity for repositories to streamline ingest processes. This can ensure that content goes live in repositories faster, as manual work in the submission process gets reduced.

Modern versions of DSpace and Open Repository already provide automation to the extent where publication metadata doesn't need to be entered from scratch, but where the ingest process can start with a search operation in external sources including PubMed, Crossref, ScienceDirect and Scopus. If the corresponding item is found, the metadata can be imported automatically, reducing the chance of introducing human errors and the time to publish the record in the repository.

Some institutions using DSpace have been receiving automated repository deposits, through the SWORD API, from publishers including Biomed Central, for several years.

And finally, the UK Publications Router has been making great leaps forward in its DSpace integration attempts, offering an automated triaging and depositing system, also based on SWORDv2.

Of course, driving this automation further will come with its set of challenges. But we believe our community has already made important steps forward and that with or without Plan S, automated ingestion is high on the wishlist of many repository managers and their institutions.

Full text stored in XML in JATS standard (or equivalent)

It is true that today, most repositories are storing and serving large numbers of MS Word and PDF files. Finding publications in open and standard machine readable formats such as JATS XML is the exception, not the rule.

In our opinion, the fact that Plan S states this requirement, does not automatically imply that the responsibility for the authoring of conversion into open and standard formats will be the responsibility of the repositories. At the very least, it means that if an author or automated service wants to deposit an XML JATS file, the repository should accept and serve it as such, and refrain from only serving transformed, less open versions.

Publishing platforms like F1000 Research already offer XML formats today, and the corresponding item for this example publication in DSpace shows that it is possible to serve these XMLs in a repository.

Initiatives like INK - the file conversion engine from the Coko Foundation make solid and open progress in this area.

Admitted, getting a publication authored and published in an open, machine readable XML format like JATS is not trivial, but it shouldn't be seen as an area of responsibility that falls completely and only within the scope of the institutional repository.

Quality assured metadata in standard interoperable format

The full requirement reads:

Quality assured metadata in standard interoperable format, including information on the DOI of the original publication, on the version deposited (AAM/VoR), on the open access status and the license of the deposited version. The metadata must fulfil the same quality criteria as Open Access journals and platforms (see above). In particular, metadata must include complete and reliable information on funding provided by cOAlition S funders. OpenAIRE compliance is strongly recommended.

This is an area where OpenAIRE and RIOXX have already laid the groundwork in Europe. With a RIOXX patch available for DSpace, that has also been adopted as a standard feature in Open Repository, it is unlikely to be a major technical challenge to get the proper metadata fields in place.

The standard DSpace integration with CC licenses and the Sherpa ROMEO API to make it easy to discover the specific open access licensing terms from publishers are two other features that contribute to this.

Open API to allow others (including machines) to access the content

DSpace ships by default with an OAI-PMH, OAI-ORE and REST-API interface. An implementation of ResourceSync, as recommended by COAR, is also already available, albeit not yet merged into the core DSpace codebase yet.

So in terms of Open APIs, DSpace and its community of users are positioned to move and adhere quickly, compared to platforms that may not yet have these facilities.

QA process to integrate full text with core abstract and indexing services (for example PubMed)

When interpreting this requirement, we are taking the angle that Plan S wants to maximize the number of people to find their way to full text in repositories. The audiences that are targeted here, are the users who are searching on indexing services that might today, not always serve the links to the full text in repositories.

Google Scholar already attempts to directly serve the link to full text in repositories, when it is able to efficiently crawl and index the repositories.

PubMed offers LinkOut, allowing repositories to get their full text links immediately advertised in PubMed.

It is not entirely clear how a single QA process could deal with diverging requirements from different indexing services, but at the very least, clearly advertising contact points for the repository and actively engaging in dialogues with these indexing services could already be two cornerstones of QA and active policy. The goal that can be achieved is that repositories and their managers are encouraged to take a more active role in getting those integrations effectively in place.

Continuous availability and helpdesk

The unavailability or slow performance of an institutional repository may affect the overall online reputation of an institution. It is certainly not a bad thing that Plan S re-iterates these points. Our experience is that most of the DSpace users and the institutions we get to interact with, are already very committed in running a professional, responsive service with clear contact points. They already realise to which extent the repository contributes to the online image of the institution as a whole, and are not waiting for Plan S to step up to this challenge.

Because it IS a challenge. Running a professional, large scale repository service is far from trivial. As a single example, the volume of "Request a copy" requests that a successful repository receives on a daily basis can not be underestimated. Even to such an extent that an institution like Cambridge apologises in advance to their users not being able to respond to these requests as normal in the upcoming holidays.

Conclusion

Immediate and full Open Access is an ambitious goal to strive for together, in 2019 and beyond. The guidance may not be complete or perfect, but it sets out a concise and straightforward vision that is unlikely to conflict with other goals and ambitions of DSpace based repositories.