CARL Collaboration with OpenAIRE

Project Overview

From 2018 to 2021, CARL (via its Open Repositories Working Group – ORWG) entered into a collaboration with OpenAIRE as part of the OpenAIRE Advance project. The ultimate aim of this collaboration was to provide a gateway to Canadian scholarly content, with an initial emphasis on representing the publications stemming from Tri-Agency funded projects in the OpenAIRE discovery portal.

Much of the CARL work took place behind the scenes and included the adoption of the OpenAIRE guidelines in Canadian repositories and journal platforms, working with OpenAIRE to ensure Tri-Agency affiliations are accurate, and developing workflows for curating metadata records.

The CARL OpenAIRE Task Group completed the pilot implementation phase in 2021. Several repositories are now harvested by OpenAIRE, and Canada’s federal granting agencies (CIHR, NSERC and SSHRC) have now been integrated into the OpenAIRE aggregation. Using the OpenAIRE Explore tool, users can limit results by agency to find funded research outputs. It is important to note that only research articles (or other types of output) that acknowledge the funder name are currently linked to each agency.

In 2022, the Canadian OpenAIRE Task Force was created to oversee the next phase of the CARL-OpenAIRE Collaboration, with the goal of expanding the CARL-OpenAIRE collaboration to additional institutions and ensuring sustainable management of the Collaboration in the future.

About OpenAIRE

OpenAIRE is a discovery service for articles and other types of content funded by the European Commission (EC). It was initially developed to help the EC track their funded research outputs and monitor compliance with their open access policy. OpenAIRE aggregates metadata and full text content from thousands of data providers around the world, and searches for affiliation information to identify relationships between funders and research outputs (such as articles and data). OpenAIRE services are freely available and can be accessed via the OpenAIRE website.

OpenAIRE (and repository networks in general) offers a mechanism for bringing together the repository community. It incentivizes a certain level of interoperability across Canadian repositories, provides an aggregation, and increasingly offers value added services such as common usage statistics, curation of metadata, and so on. It should be noted that as a greater number of repository software platforms are becoming OpenAIRE compliant out-of the box (e.g. DSpace 7) it will become much easier for many Canadian institutions to connect to OpenAIRE.

Last update: January 17, 2023

Why is this Project Important?

While Canada’s three major funders have an Open Access Policy on Publications that requires funded researchers to make the peer-reviewed results of their research available in open access within 12 months of publication, there is currently no simple way of tracking compliance with this policy because Canadian research articles are distributed across many publishers and repositories.

OpenAIRE aggregates metadata records and full text (when available) from thousands of data sources internationally and identifies funder relationships using text and data mining and Crossref affiliations. As such, it is able to track a wider range of research outputs than commercial databases, such as Scopus and Web of Science. By partnering with OpenAIRE, we can ensure that all relevant research outputs, including those in Canadian repositories, are represented in this corpus.

This initiative is strengthening the existing repository network in Canada, and will reduce our dependence on external players that do not have values aligned with openness and the public good. Additionally, making Canadian researcher outputs available through international discovery services such as OpenAIRE is essential for ensuring that Canadian research is visible and included in research assessment systems.

This project is also building the workflows and expertise needed for Canada to take on a more active role in managing and tracking Canadian research outputs, should we decide to do so.

How Can You Participate?

There are several ways you can participate in the project, depending your level of interest, type of repository and resources available:

Direct harvest of literature repositories (or Canadian journals/publisher) by OpenAIRE

If you manage a Canadian repository and want your records to be represented in the OpenAIRE aggregation, the preferred route is for your repository to become compliant with the OpenAIRE Guidelines 4.0. This will ensure that the contents of your repository are regularly harvested, and that OpenAIRE has access not only to the metadata, but also to the full text, which it can data mine for funder affiliation and other relationships. It will also allow you to participate in the usage statistics services offered by OpenAIRE, which is much more accurate than the Google Analytics services you may currently be using.

The ORWG is currently working with several pilot repositories to support their compliance with OpenAIRE Metadata Guidelines version 4, and will soon release support materials to help you at your own institution. These materials will be available for several platforms:

DSpace — An add-on has been developed by 4Science, with funding from a group of CARL institutions that can be used to support OpenAIRE compliance in DSpace versions 5 and 6. Version 7 is expected to be compliant out-of-the-box
ePrints — Code has been developed by Concordia University Library that will be shared with the ePrints community
Islandora — Simon Fraser University Libraries is planning to develop and share code for compliance with Islandora 8
Digital Commons — currently compliant with OpenAIRE v.3.

Indirect harvesting of literature repositories by OpenAIRE through Canada Research

For institutions that do not have the interest or resources to implement the OpenAIRE Metadata Guidelines 4.0, McMaster University Libraries has developed a Canadian aggregator, Canada Research, that harvests metadata from Canadian repositories and transforms the records to become compliant with the OpenAIRE guidelines.

This approach will ensure repository records are visible in OpenAIRE, but means that the repository cannot benefit from other value-added services offered by OpenAIRE. In this context, the Ad Hoc COVID-19 Working Group is developing shared curation workflows to allow repository managers to enhance the metadata in the Canada Research aggregation with funder affiliation and other key metadata elements.

Indirect harvesting of data repositories by OpenAIRE though FRDR (Portage)

For data repositories, we are recommending that your repository be harvested by the Federated Research Data Repository (FRDR) discovery service. FRDR offers a federated search tool that will provide a focal point to discover and access Canadian research data, while the range of services provided by FRDR will help researchers store and manage their data, preserve their research for future use, and comply with institutional and funding agency data management requirements. FRDR is also working on compliance with OpenAIRE metadata guidelines for data repositories so all data records in FRDR will be available in the OpenAIRE aggregation. If you wish to have your data repository aggregated by FRDR, contact

The COVID-19 Case Study

In support of the medical and other pandemic-related research and information sharing, OpenAIRE has developed a special search portal, OpenAIRE COVID-19 Gateway, which provides access to research outputs (publications, data, software, protocols, etc.) related to COVID-19. If your repository contents are aggregated directly by OpenAIRE or through Canada Research, the COVID-19 related records will also be available through this gateway.

Press Releases

Presentations

Update on CARL-OpenAIRE collaboration by Pierre Lasou (December 2020)
- Recording forthcoming (English)
- View presentation slides
Presentation on CARL-OpenAIRE collaboration by Pierre Lasou (April 2019)
- View recording
- View presentation slides
Poster presented at LIBER 2018 by OpenAIRE Task Group lead Pierre Lasou

Documentation

This section is intended for repository managers who are ready to start working toward implementation of the OpenAIRE Guidelines in their repository. The following documents have been prepared by members of the CARL Open Repositories Working Group‘s Task Group on OpenAIRE. For support in implementing the OpenAIRE Guidelines in your repository, please contact .

The two main documents that you will wish to consult are:

OpenAIRE Implementation FAQ (see next tab)
OpenAIRE Decision Tree (.pdf)

The Decision Tree links out to the following documents, which may also be useful to you:

OpenAIRE Implementation FAQ

The FAQs below have been developed by the ORWG to support harvesting of Canadian repositories by OpenAIRE. If you do not find the answers to your questions below, OpenAIRE hosts content provider community calls on the first Wednesday of every month where you can ask your questions. They also publish a newsletter or you can contact the Helpdesk directly.

If you would like to contribute your own questions to this FAQ, please send it to the CARL Open Repositories Working Group’s OpenAIRE support team ().

Last updated: July 5, 2021

Implementing OpenAIRE Metadata Guidelines

What are the considerations involved in becoming OpenAIRE compliant?

To be OpenAIRE compliant and be harvested by OpenAIRE, a repository must adopt the OpenAIRE guidelines. These guidelines define how the repository should expose its metadata records. Some effort is required by repositories to become OpenAIRE compliant. A decision tree was developed to help you decide whether implementing the OpenAIRE guidelines is feasible for you. If not, you can have your repository harvested by the Canada Research aggregator.

What are the OpenAIRE metadata guidelines?

The OpenAIRE Guidelines define how metadata is exposed via the OAI-PMH protocol in order to integrate with OpenAIRE infrastructure. OpenAIRE has guidelines for different types of data providers:

In Canada, focus was limited to OpenAIRE Guidelines for Literature, institutional, and thematic Repositories and Portage FRDR is already compliant with OpenAIRE Guidelines for Data Archives.

What do I need to do to adopt the guidelines in my repository?

Adopting the guidelines requires some technical adjustments in your repository, which may differ depending on which repository platform you are using. There is technical documentation available for DSpace, Bepress and EPrints. See the decision tree to help guide you through the process, which links to the technical documents.

What is the minimum level of metadata needed to be OpenAIRE compliant?

The minimum metadata requirements for OpenAIRE compliance for literature repositories is outlined in the Canadian Lite OpenAIRE Literature Repository Guidelines.

I’m not sure I can implement the OpenAIRE Guidelines in my repository. Can I still participate in the OpenAIRE project?

To be harvested by OpenAIRE you do not need to implement the full guidelines. The minimum implementation level is outlined in Canadian Lite OpenAIRE Literature Repository Guidelines. If you are not able to implement the minimum metadata requirements, you can also be harvested by the Canada Research aggregator, which is, in turn, being harvested by OpenAIRE. This however, means that you will not be able to benefit from the OpenAIRE enhancement of your records and text and data mining that will identify funder affiliations.

My repository is not OpenAIRE compliant, how can I be harvested by the Canada Research aggregator?

Canada Research is a metadata aggregator based at McMasters University Library. Canada Research harvests records from non-compliant repositories and transforms their metadata so they can be, in turn, harvested directly from Canada Research by OpenAIRE. This document outlines the workflow for repositories to be harvested by Canada Research. You can also curate your records in the Canada Research aggregation to add relevant funder information.

If I do not have information for a specific metadata element, should I use a generic value (e.g. unknown)?

OpenAIRE recommends to leave fields empty if no information about a specific element is available , rather than entering placeholder values, except in the case of the date and title elements, which are required to have values (see more information about date below).

I do not have the funder information for the articles I am uploading. Can I still participate?

Yes you still can. One of the main objectives of OpenAIRE is to support tracking of funders research outputs. OpenAIRE uses a number of techniques to identify funder information, including text and data mining of full text articles for funder names, and integrating and enhancing repository records from other sources, such as crossref. However, it is not always possible to find the funder affiliation using these techniques and therefore some records in OpenAIRE will be missing the funder affiliation and will not be visible in the funder specific searches.

How should I proceed for dates if it is unknown or inferred?

A date must be estimated. It is not possible to use values as unknown or inferred in dates elements. Dates need to have a 4 digit year, but month or day are optional.

In dates, what does “available” mean?

Dates are used in the OpenAIRE Guidelines as defined in the Datacite Schema 4.3. Available is defined as followed: “The date the resource is made publicly available”. OpenAIRE uses 3 different dates that must be specified using the dateType attribute:

One for Publication Date (M): dateType is Issued
Two for the Embargo Period Date (MA):
1. To indicate the START of an embargo period, dateType is Accepted.
2. To indicate the END of an embargo period, dateType is Available.

As the publication date is often used as a reference to define an embargo period, the Issued date and the Accepted date are often the same.

For example:

<datacite:dates>
<datacite:date dateType=”Issued”>2011-12-01</datacite:date>
<datacite:date dateType=”Accepted”>2011-12-01</datacite:date>
<datacite:date dateType=”Available”>2012-12-01</datacite:date>
</datacite:dates>

If the date used for an article in the repository is the date of acceptance of the article, can it be used in Publication Date?

If only the date of acceptance is available but not the date the item was released / issued / published the date of acceptance can be used instead. The attribute dateType must still be Issued. Regarding the dateTypes Accepted and Available OpenAIRE is following the semantics in DataCite Schema V4.3 (see p.39) which reserved these attributes to indicate embargo periods.

How can I distinguish “personal” vs “corporate” creator name in the metadata record?

For literature repositories, corporate and personal creator can be distinguished using the attribute nameType with the controlled list values

Organizational
Personal

For example:

<datacite:creators>
<datacite:creator>
<datacite:creatorName nameType=”Personal”>Evans, R.J.</datacite:creatorName>
<datacite:affiliation>Institute of Science and Technology</datacite:affiliation>
</datacite:creator>
</datacite:creators>

For Data Archives, this is currently not supported. However, Data Guidelines are being updated (see alpha version). The new version will support the distinction between organizational and personal creator names.

Is it possible to include conference names? Which guideline element should be used?

There is currently no metadata element for conference names in the OpenAIRE Guidelines.

Can we use a non-controlled value in Alternate Identifier element alternateIdentifierType attribute?

No. It is not possible to use a local non-controlled identifier. The identifier provided must pertain to one of the controlled list values of the alternateIdentifierType attribute.

Can I include more than one CitationTitle (e.g. for item in book in series)?

At the moment, it’s not possible to include more than one citation. Most likely, this will be changed by an updated version of the guidelines (and introducing a container element).

However, it is recommended to make use of the Related Identifier field. The allowed attributes include “IsPartOf”. With that, one could portray the relation of a publication being part of a book (ISBN) as well as a series (ISSN) (the field is repeatable, 0-n).This work wouldn’t be preliminary/redundant, as it’s the identifier which is used to create a relation in the graph. The citation elements are just the human readable version of this information.

In the Eprints software, how can we implement the Embargo Period Date?

In Eprints, Concordia University has developed and shared its code to enable compliance for mandatory elements of the OpenAIRE literature Guidelines. The code and its documentation can be found on the Github project page.

Should we work with COAR Resource Type Version 2.0 that has been released in July 2019 or can we use former version 1.1?

Following some best practice rules concepts, once defined in the COAR vocabularies, will not disappear or change their semantics (except for refinements) in future releases. If the set of concepts in v1.1 are sufficient for your case(s) it is safe to use it. If v1.1. lacks a concept you need but which is contained in v2.0 use the latest version.

Does OpenAIRE have plans to support Datacite schema 4.3?

Yes, OpenAIRE does/will also support the DataCite schema 4.3.

How can I suggest changes or improvements to the guidelines?

Suggestions regarding additional properties or vocabulary terms in the OpenAIRE Literature Guideline V4 application profile are always welcome. To do so please create an issue at https://github.com/openaire/guidelines-literature-repositories/issues or generate a pull request https://github.com/openaire/guidelines-literature-repositories/pulls.

Registering your repository with OpenAIRE

Why should I register with OpenAIRE?

The only way to be harvested by OpenAIRE is to register your repository. OpenAIRE does not harvest from repositories that are not registered with their service.

How do I register my repository with OpenAIRE?

To register, you first need to create a user account in The OpenAIRE Provide dashboard. Then, it is strongly recommended that you run tests with the OpenAIRE online validator available in your OpenAIRE Provide dashboard user account (choose from menu “compatibility->validate”). If you pass the validation process (no errors) you can register or update your repository from the menu “sources”. If you register a “literature” repository, information from OpenDOAR is reused; if you register a “data” repository information from re3data is reused and can be completed or modified to some extent.

Next, enter the “interface” information which includes the OAI base URL, the OpenAIRE guidelines you want to validate your repository upon and optionally the OAI-set used for validation, which will later be used for the harvesting.

Once you click “next,” the registration process starts which includes a validation process.

You will be informed by email about the status of your registration.

Validating Your Repository

The OpenAIRE Validator Service performs validation checks on both the quality of implementation of the OAI-PMH protocol and the conformance of the metadata. It is a rule-based system which also provides an admin panel to allow users to easily configure the individual validation rules and sets of rules that implement the guidelines. OpenAIRE uses this service to validate all its registered content providers (literature repositories, OA publishers, data repositories/archives, aggregators, CRIS systems) to the OpenAIRE guidelines.

Can I use a test server for validation prior to validating the production data?

You can use the online validator service inside the OpenAIRE Provide dashboard on a test server. If your test server is IP-protected, you will need to contact OpenAIRE () to obtain their IP addresses to use in your server whitelist.

When using the OpenAIRE online validation tool, I get an error message “0 out of 100 records validated.” Is there a maximum number of records that can be harvested at one time?

This is likely an issue with one or more of the records being not compliant with the OpenAIRE Guidelines – or an issue with how the local records are exposed. OpenAIRE confirms that they are able to validate at least 5000 records at a time.

My repository is or will be harvested via the Canada Research aggregator. Will OpenAIRE provide a direct link to the content in my repository or only the link to my record in Canada Research?

Canada Research will provide OpenAIRE the provenance information of harvested records, allowing OpenAIRE to provide a direct link to the content in the original repository.

My repository is not categorized as being a Canadian repository. How can I change this?

OpenAIRE undertakes an automated matching process between repository and institution. It uses PIDs to map organizations, but as most institutions do not have PIDs, inaccurate matches sometimes occur. If this happens, contact OpenAIRE support ().

Harvesting Your Records

How does OpenAIRE deduplicate records in its aggregation?

OpenAIRE performs de-duplication of organizations and publications by identifying matching elements:

For articles, via:

PIDs
Title, date, abstracts
Language
Licence
Publisher

For author name, via:

PIDs, like ORCID

Whenever the deduplication algorithm finds duplicates of the same publication, all information from all of the duplicates is kept. OpenAIRE keeps track of the provenance of information (i.e. if it has been inferred by the mining algorithm, if it has been claimed by authenticated portal users or if it was present in the metadata record collected from a data source).

When OpenAIRE identifies a potential duplicate, an event in the broker notification service is logged. The repository manager then needs to decide to reject the duplicate notification or accept it. Duplicate records are not deleted, instead OpenAIRE merges the records and enriches the already existing records.

How does OpenAIRE enhance the metadata records from my repository?

OpenAIRE runs inference algorithms to enrich the aggregation with additional information extracted from the publications’ full texts to the following elements:

Subjects
Links to datasets
Links to projects
Links to research communities and infrastructures
Links to publications (i.e. similar publications)
Links to software
Links to biological entities (e.g. PDB)
Citations

Is it possible to remove a record from the OpenAIRE aggregation? (e.g. for copyright reasons, a record must be removed and all online access deleted). If so, how does OpenAIRE do this?

Yes. OpenAIRE removes records when they are no longer included in the records harvested by the data provider. To be sure the record is permanently removed, OpenAIRE suggests repositories use a persistent deleting strategy through ensuring that the updated/deleted records’ date stamps are updated accordingly. This way, with incremental harvesting, records deleted in the repository will no longer be available through OpenAIRE. If you need to remove a lot of records or need to remove a record urgently, you can contact OpenAIRE directly (), so that it can perform a manual harvest and refresh results.

Technical issues with OpenAIRE compliance

How can the repository expose metadata in more than one language?

The attribute xml:lang can be used in any elements of the OpenAIRE Literature Repository Guideline V4.

Example for 1. Title (M)

<datacite:title xml:lang=”eng”>Land use planning for disaster risk management</datacite:title>

<datacite:title xml:lang=”fra” titleType=”TranslatedTitle”>Planification de l’utilisation des terres pour la gestion des risques de catastrophes</datacite:title>

<datacite:title xml:lang=”spa” titleType=”TranslatedTitle”>Planificacion del uso de la tierra a favor de la gestion del riesgo de desastres</datacite:title>

Is it possible to link to several files (e.g. suppl., errata, image) or only to the main manuscript?

You can decide to either send the main manuscript or all the files associated to an item in your repository. File links must be provided in the 23. File Location (MA) element. If you plan to provide multiple files, it is important to use one or more attributes described in the guidelines to distinguish each file.

**Should the File Location element appear only for Open Access articles?**

The file location can be provided regardless of the access right status. However, the purpose of File Location (MA) element is to provide the URL of the fulltext and the corresponding licence conditions in order to process the full text for text and data mining.

It can happen that an item in the repository has multiple files with different access rights (i.e article is open access but supplemental data restricted). How should we deal with it?

It is possible to specify the access right at the file level using the accessRightsURI attribute of the File location element.

For example, one item containing both an open access article in PDF and restricted supplemental data would be managed as follows:

<oaire:file accessRightsURI=”http://purl.org/coar/access_right/c_abf2″ mimeType=”application/pdf” objectType=”fulltext”>http://link-to-the-fulltext.org/article.pdf</oaire:file>

<oaire:file accessRightsURI=http://purl.org/coar/access_right/c_16ec” mimeType=”text/csv” objectType=”other”>http://link-to-the-fulltext.org/supdata.csv</oaire:file>

How should I deal with multiple files in the same record that have different licenses?

OpenAIRE does not currently support licensing at the level of the individual file (in File Location element).

If there is no text in a mandatory element field, will that record be rejected by OpenAIRE?

It would only be rejected if there is no title or no date. For all other fields, empty values are allowed. For title, you could enter “No title available”. For the date, one must be estimated. It is not possible to identify the exact dates, then a 4 digit year should be estimated.

Credits:
This initial project undertaken under the leadership of the ORWG’s Task Group on OpenAIRE, led by Pierre Lasou (Laval University) and the Ad Hoc COVID-19 Working Group, led by Kathleen Shearer (CARL and COAR). Other participants were Lise Brin (CARL), Corey Davis (CARL), Danoosh Davoodi (University of Alberta), Sharon Fennel (University of Alberta), Jordan Hale (University of Waterloo), Geoff Harder (University of Alberta), Yoo Young Lee (University of Ottawa), Lindsey MacCallum (Mount Saint Vincent University), Courtney Earl Matthews (Queen’s University), Gabriela Mircea (McMaster University), Tomasz Neugebauer (Concordia University), Kelly Stathis (Portage Network-Canadian Association of Research Libraries), Andrea Szwajcer (University of Manitoba), and Mita Williams (University of Windsor).