Forum Data Publication and Availability #5: Persistent Identifiers

Brief Report | 15. December 2022

Forum Data Publication and Availability #5: Persistent Identifiers – Report

By Franziska Fritzsche, M. A. and Eva Bodenschatz, M. Sc.

Display with yellow LED arrows pointing to the right

The task area Data Publication and Availability (Task Area 4) of the consortium for research data on material and immaterial cultural heritage hosted their fifth event, an open online forum on the topic "Persistent Identifiers" (PIDs), on 17th November 2022, 9am - 2.30pm. It focused particularly on the question which types of persistent identifiers are suitable for repositories, data services or image galleries and how we can establish transparency for users and among infrastructure services within the Culture community.

A persistent identifier is a permanent and unique reference to a digital resource, thereby making it findable and citable. As there are different methods for referencing digital objects and resources, external service providers and initiatives specialising in permanent identification for digital research data and outputs presented their solutions in keynote lectures. Participants representing research, GLAM and infrastructure facilities took the opportunity to ask specific questions and to discuss the application and provision of PIDs in humanities and cultural studies.

After the two Co-Spokespersons Dr Maria Effinger and Dr Jens Bove opened the event, Robert Ulrich (KIT) gave an introduction on persistent identifiers and re3data, the registry of research data repositories. Persistent identifiers play an important role throughout the entire research data cycle, which explains the wide range of PIDs. However, the challenge for all of them is to enable unique and sustainable references to (digital and physical) objects as well as their interoperable use. Especially the combination of different identifiers creates additional value, e.g. DOI (Digital Object Identifier) for works, ORCID (Open Researcher and Contributor ID) for persons and ROR (Research Organization Registry) for institutions. re3data is a generic registry of research data repositories, which was created in a collaborative effort by an international working group and is managed by DataCite. An international team of editors assists with indexing repositories in re3data and reviews metadata as well as the implementation of the FAIR principles.

Afterwards, Paul Vierkant talked about the non-profit organisation DataCite, an international consortium with the goal to provide easier access to scientific data and to increase its citability. It focuses on the use of DOIs to uniquely reference (digital or physical) resources and making them discoverable in different repositories. A metadata schema with partially controlled values ensures that the relation between research data and publication remains traceable and that the data can be found via search engines. One such PIDs search service is DataCite Commons, which aggregates search results via the use of DOI, ROR and ORCID.

Another international consortium providing PID services for digital objects is ePIC, which was introduced by Hon. Prof Dr Philipp Wieder (GWDG). To ensure funding and infrastructure and be able to respond directly to research needs, its services are carried by large consortia. The so-called ePIC ID, like the DOI, is based on an international handle system that contains semantic information. For the publication of research data, a persistent ID can be created via a web form. The generated identifiers are machine-readable and can be retrieved via REST API. The corresponding working group of the Research Data Alliance (RDA) is currently looking into standardising metadata for PIDs.

Making digital objects permanently citable can also be achieved with the URN Service (Uniform Resource Name) of the Deutsche Nationalbibliothek (German National Library), which Uta Ackermann and Stephanie Palek presented. URNs for digital objects are closed units, as far as content is concerned, that are being archived for the long term. The advantage of this identifier is the link to the corresponding metadata located in the library catalogue. The DNB's legally defined collection mandate stipulates that all works in the form of text, image and sound must be handed in as legal deposits - this also includes digital publications. For this purpose, the DNB assigns static URNs that cannot be changed afterwards. Partner institutions, such as universities and publishers, can also assign URNs independently. However, these are dynamic as they can be changed under certain circumstances. In this case, the identifier's persistence strongly depends on the reliability of the respective institution. As several URLs can be hidden behind a dynamic URN, the DNB offers a URN resolver service.

Dr Janete Saldanha Bach (GESIS) then introduced the PID service da|ra by KonsortSWD. The consortium is part of the German National Research Data Infrastructure (NFDI) and develops services for research with data from social, behavioural, educational and economic sciences. In cooperation with the ZBW, GESIS operates the non-commercial registration agency da|ra to make social science research data permanently identifiable and available through DOIs. As PIDs usually refer to the entire digital object, it is difficult for researchers to directly cite individual parts of a data set. Therefore, it is possible to register variable PIDs in order to reference sub-elements and different levels of a data set. For this purpose, the service is being extended to ePIC's handle standard and uses DataCite's metadata standard. Another important feature is bulk processing for the registration of larger amounts of data.

Concluding the keynote lectures, Dr Melanie Gruß (Institute of Theatre Studies, Leipzig University) and Dr Desiree Mayer (SLUB Dresden) presented the significance of authority data in connection with PIDs. Authority data are structured data sets that are set up according to a set of rules and are scientifically verified. They have an unchangeable ID and are created for persons, corporate bodies, geographic names, events, concepts and works. It is recommended to use authority data curated and offered by national libraries. If research data publications include authority data, especially subject headings, in their metadata, these can be linked, thereby making the research output findable via its content.

The open plenary discussion that followed gave not only room for questions referring to the presentations, but further provided an opportunity to discuss issues and challenges around persistent identifiers. Topics included:

Use of several PIDs for a single object
Permanent referencing of buildings and their components
Transparency within the community about the PIDs used
Experience with identifiers in relation to name referencing
Dealing with changes in metadata due to new research findings
Use of PIDs for different data types

The forum revealed that, depending on the need, there are already well-functioning digital solutions for permanently identifying, preserving and locating research data, objects and results. However, further action is needed to promote standardisation and dissemination of PIDs for research data and to enable research according to the FAIR principles. The strong interest in this particular subject was impressive and we would like to express our sincere thanks for the lively participation.

The minutes, including detailed information on the keynote lectures and the plenary discussion, as well as the released presentation slides can be found here. The slides on DataCite are also published on Zenodo.

Nov 17

Forum