Last updated: July 5, 2021
The FAQs below have been developed by the ORWG to support harvesting of Canadian repositories by OpenAIRE. If you do not find the answers to your questions below, OpenAIRE hosts content provider community calls on the first Wednesday of every month where you can ask your questions. They also publish a newsletter or you can contact the Helpdesk directly.
If you would like to contribute your own questions to this FAQ, please send it to the CARL Open Repositories Working Group’s OpenAIRE support team (firstname.lastname@example.org).
Implementing OpenAIRE Metadata Guidelines
To be OpenAIRE compliant and be harvested by OpenAIRE, a repository must adopt the OpenAIRE guidelines. These guidelines define how the repository should expose its metadata records. Some effort is required by repositories to become OpenAIRE compliant. A decision tree was developed to help you decide whether implementing the OpenAIRE guidelines is feasible for you. If not, you can have your repository harvested by the Canada Research aggregator.
The OpenAIRE Guidelines define how metadata is exposed via the OAI-PMH protocol in order to integrate with OpenAIRE infrastructure. OpenAIRE has guidelines for different types of data providers:
- OpenAIRE Guidelines for Literature, institutional, and thematic Repositories
- OpenAIRE Guidelines for Data Archives
- OpenAIRE Guidelines for CRIS Managers
- Draft OpenAIRE Guidelines for Software Repository Managers
- Draft OpenAIRE Guidelines for Other Research Products
In Canada, focus was limited to OpenAIRE Guidelines for Literature, institutional, and thematic Repositories and Portage FRDR is already compliant with OpenAIRE Guidelines for Data Archives.
Adopting the guidelines requires some technical adjustments in your repository, which may differ depending on which repository platform you are using. There is technical documentation available for DSpace, Bepress and EPrints. See the decision tree to help guide you through the process, which links to the technical documents.
The minimum metadata requirements for OpenAIRE compliance for literature repositories is outlined in the Canadian Lite OpenAIRE Literature Repository Guidelines.
I’m not sure I can implement the OpenAIRE Guidelines in my repository. Can I still participate in the OpenAIRE project?
To be harvested by OpenAIRE you do not need to implement the full guidelines. The minimum implementation level is outlined in Canadian Lite OpenAIRE Literature Repository Guidelines. If you are not able to implement the minimum metadata requirements, you can also be harvested by the Canada Research aggregator, which is, in turn, being harvested by OpenAIRE. This however, means that you will not be able to benefit from the OpenAIRE enhancement of your records and text and data mining that will identify funder affiliations.
Canada Research is a metadata aggregator based at McMasters University Library. Canada Research harvests records from non-compliant repositories and transforms their metadata so they can be, in turn, harvested directly from Canada Research by OpenAIRE. This document outlines the workflow for repositories to be harvested by Canada Research. You can also curate your records in the Canada Research aggregation to add relevant funder information.
If I do not have information for a specific metadata element, should I use a generic value (e.g. unknown)?
OpenAIRE recommends to leave fields empty if no information about a specific element is available , rather than entering placeholder values, except in the case of the date and title elements, which are required to have values (see more information about date below).
Yes you still can. One of the main objectives of OpenAIRE is to support tracking of funders research outputs. OpenAIRE uses a number of techniques to identify funder information, including text and data mining of full text articles for funder names, and integrating and enhancing repository records from other sources, such as crossref. However, it is not always possible to find the funder affiliation using these techniques and therefore some records in OpenAIRE will be missing the funder affiliation and will not be visible in the funder specific searches.
A date must be estimated. It is not possible to use values as unknown or inferred in dates elements. Dates need to have a 4 digit year, but month or day are optional.
Dates are used in the OpenAIRE Guidelines as defined in the Datacite Schema 4.3. Available is defined as followed: “The date the resource is made publicly available”. OpenAIRE uses 3 different dates that must be specified using the dateType attribute:
- One for Publication Date (M): dateType is Issued
- Two for the Embargo Period Date (MA):
- To indicate the START of an embargo period, dateType is Accepted.
- To indicate the END of an embargo period, dateType is Available.
As the publication date is often used as a reference to define an embargo period, the Issued date and the Accepted date are often the same.
If the date used for an article in the repository is the date of acceptance of the article, can it be used in Publication Date?
If only the date of acceptance is available but not the date the item was released / issued / published the date of acceptance can be used instead. The attribute dateType must still be Issued. Regarding the dateTypes Accepted and Available OpenAIRE is following the semantics in DataCite Schema V4.3 (see p.39) which reserved these attributes to indicate embargo periods.
For literature repositories, corporate and personal creator can be distinguished using the attribute nameType with the controlled list values
<datacite:creatorName nameType=”Personal”>Evans, R.J.</datacite:creatorName>
<datacite:affiliation>Institute of Science and Technology</datacite:affiliation>
For Data Archives, this is currently not supported. However, Data Guidelines are being updated (see alpha version). The new version will support the distinction between organizational and personal creator names.
There is currently no metadata element for conference names in the OpenAIRE Guidelines.
Can we use a non-controlled value in Alternate Identifier element alternateIdentifierType attribute?
No. It is not possible to use a local non-controlled identifier. The identifier provided must pertain to one of the controlled list values of the alternateIdentifierType attribute.
At the moment, it’s not possible to include more than one citation. Most likely, this will be changed by an updated version of the guidelines (and introducing a container element).
However, it is recommended to make use of the Related Identifier field. The allowed attributes include “IsPartOf”. With that, one could portray the relation of a publication being part of a book (ISBN) as well as a series (ISSN) (the field is repeatable, 0-n).This work wouldn’t be preliminary/redundant, as it’s the identifier which is used to create a relation in the graph. The citation elements are just the human readable version of this information.
In Eprints, Concordia University has developed and shared its code to enable compliance for mandatory elements of the OpenAIRE literature Guidelines. The code and its documentation can be found on the Github project page.
Should we work with COAR Resource Type Version 2.0 that has been released in July 2019 or can we use former version 1.1?
Following some best practice rules concepts, once defined in the COAR vocabularies, will not disappear or change their semantics (except for refinements) in future releases. If the set of concepts in v1.1 are sufficient for your case(s) it is safe to use it. If v1.1. lacks a concept you need but which is contained in v2.0 use the latest version.
Yes, OpenAIRE does/will also support the DataCite schema 4.3.
Suggestions regarding additional properties or vocabulary terms in the OpenAIRE Literature Guideline V4 application profile are always welcome. To do so please create an issue at https://github.com/openaire/guidelines-literature-repositories/issues or generate a pull request https://github.com/openaire/guidelines-literature-repositories/pulls.
Registering your repository with OpenAIRE
The only way to be harvested by OpenAIRE is to register your repository. OpenAIRE does not harvest from repositories that are not registered with their service.
How do I register my repository with OpenAIRE?
To register, you first need to create a user account in The OpenAIRE Provide dashboard. Then, it is strongly recommended that you run tests with the OpenAIRE online validator available in your OpenAIRE Provide dashboard user account (choose from menu “compatibility->validate”). If you pass the validation process (no errors) you can register or update your repository from the menu “sources”. If you register a “literature” repository, information from OpenDOAR is reused; if you register a “data” repository information from re3data is reused and can be completed or modified to some extent.
Next, enter the “interface” information which includes the OAI base URL, the OpenAIRE guidelines you want to validate your repository upon and optionally the OAI-set used for validation, which will later be used for the harvesting.
Once you click “next,” the registration process starts which includes a validation process.
You will be informed by email about the status of your registration.
Validating Your Repository
The OpenAIRE Validator Service performs validation checks on both the quality of implementation of the OAI-PMH protocol and the conformance of the metadata. It is a rule-based system which also provides an admin panel to allow users to easily configure the individual validation rules and sets of rules that implement the guidelines. OpenAIRE uses this service to validate all its registered content providers (literature repositories, OA publishers, data repositories/archives, aggregators, CRIS systems) to the OpenAIRE guidelines.
Can I use a test server for validation prior to validating the production data?
You can use the online validator service inside the OpenAIRE Provide dashboard on a test server. If your test server is IP-protected, you will need to contact OpenAIRE (email@example.com) to obtain their IP addresses to use in your server whitelist.
When using the OpenAIRE online validation tool, I get an error message “0 out of 100 records validated.” Is there a maximum number of records that can be harvested at one time?
This is likely an issue with one or more of the records being not compliant with the OpenAIRE Guidelines – or an issue with how the local records are exposed. OpenAIRE confirms that they are able to validate at least 5000 records at a time.
My repository is or will be harvested via the Canada Research aggregator. Will OpenAIRE provide a direct link to the content in my repository or only the link to my record in Canada Research?
Canada Research will provide OpenAIRE the provenance information of harvested records, allowing OpenAIRE to provide a direct link to the content in the original repository.
My repository is not categorized as being a Canadian repository. How can I change this?
OpenAIRE undertakes an automated matching process between repository and institution. It uses PIDs to map organizations, but as most institutions do not have PIDs, inaccurate matches sometimes occur. If this happens, contact OpenAIRE support (firstname.lastname@example.org).
Harvesting Your Records
How does OpenAIRE deduplicate records in its aggregation?
OpenAIRE performs de-duplication of organizations and publications by identifying matching elements:
For articles, via:
- Title, date, abstracts
For author name, via:
- PIDs, like ORCID
Whenever the deduplication algorithm finds duplicates of the same publication, all information from all of the duplicates is kept. OpenAIRE keeps track of the provenance of information (i.e. if it has been inferred by the mining algorithm, if it has been claimed by authenticated portal users or if it was present in the metadata record collected from a data source).
When OpenAIRE identifies a potential duplicate, an event in the broker notification service is logged. The repository manager then needs to decide to reject the duplicate notification or accept it. Duplicate records are not deleted, instead OpenAIRE merges the records and enriches the already existing records.
How does OpenAIRE enhance the metadata records from my repository?
OpenAIRE runs inference algorithms to enrich the aggregation with additional information extracted from the publications’ full texts to the following elements:
- Links to datasets
- Links to projects
- Links to research communities and infrastructures
- Links to publications (i.e. similar publications)
- Links to software
- Links to biological entities (e.g. PDB)
Is it possible to remove a record from the OpenAIRE aggregation? (e.g. for copyright reasons, a record must be removed and all online access deleted). If so, how does OpenAIRE do this?
Yes. OpenAIRE removes records when they are no longer included in the records harvested by the data provider. To be sure the record is permanently removed, OpenAIRE suggests repositories use a persistent deleting strategy through ensuring that the updated/deleted records’ date stamps are updated accordingly. This way, with incremental harvesting, records deleted in the repository will no longer be available through OpenAIRE. If you need to remove a lot of records or need to remove a record urgently, you can contact OpenAIRE directly (email@example.com), so that it can perform a manual harvest and refresh results.
Technical issues with OpenAIRE compliance
How can the repository expose metadata in more than one language?
The attribute xml:lang can be used in any elements of the OpenAIRE Literature Repository Guideline V4.
Example for 1. Title (M)
<datacite:title xml:lang=”eng”>Land use planning for disaster risk management</datacite:title>
<datacite:title xml:lang=”fra” titleType=”TranslatedTitle”>Planification de l’utilisation des terres pour la gestion des risques de catastrophes</datacite:title>
<datacite:title xml:lang=”spa” titleType=”TranslatedTitle”>Planificacion del uso de la tierra a favor de la gestion del riesgo de desastres</datacite:title>
Is it possible to link to several files (e.g. suppl., errata, image) or only to the main manuscript?
You can decide to either send the main manuscript or all the files associated to an item in your repository. File links must be provided in the 23. File Location (MA) element. If you plan to provide multiple files, it is important to use one or more attributes described in the guidelines to distinguish each file.
Should the File Location element appear only for Open Access articles?
The file location can be provided regardless of the access right status. However, the purpose of File Location (MA) element is to provide the URL of the fulltext and the corresponding licence conditions in order to process the full text for text and data mining.
It can happen that an item in the repository has multiple files with different access rights (i.e article is open access but supplemental data restricted). How should we deal with it?
It is possible to specify the access right at the file level using the accessRightsURI attribute of the File location element.
For example, one item containing both an open access article in PDF and restricted supplemental data would be managed as follows:
<oaire:file accessRightsURI=”http://purl.org/coar/access_right/c_abf2″ mimeType=”application/pdf” objectType=”fulltext”>http://link-to-the-fulltext.org/article.pdf</oaire:file>
<oaire:file accessRightsURI=http://purl.org/coar/access_right/c_16ec” mimeType=”text/csv” objectType=”other”>http://link-to-the-fulltext.org/supdata.csv</oaire:file>
How should I deal with multiple files in the same record that have different licenses?
OpenAIRE does not currently support licensing at the level of the individual file (in File Location element).
If there is no text in a mandatory element field, will that record be rejected by OpenAIRE?
It would only be rejected if there is no title or no date. For all other fields, empty values are allowed. For title, you could enter “No title available”. For the date, one must be estimated. It is not possible to identify the exact dates, then a 4 digit year should be estimated.