Forum Report | 11. June 2025
Report: Forum “Measuring the quality of cultural data — but how? Concepts, methods, potentials and the NFDI4Culture approach”
By Alexander Faschon, M. A. , Angela Kailus, M. A. and Dr. Celia Krause
Workers measuring
Creator: Richard Peter jr., Owner: Deutsche Fotothek
Measuring data quality is a popular topic because high quality ensures compatibility and future viability. What methods can be used to measure and evaluate quality, especially with complex cultural data? This question was the focus of the online forum, “Measuring the quality of cultural data — but how? Concepts, methods, potentials and the NFDI4Culture approach,” held on May 19 and 20, 2025, by the NFDI4Culture task area “Standards, Data Quality, and Curation.” The discussion was sparked by the requirement of the German Research Foundation that all NFDI consortia provide key figures on data sets that meet quality criteria defined by the consortium. The challenge now is to develop a suitable approach to assessing our domain's data.
The event explored current perspectives on measuring data quality in the culture-related domains, and the desired outcomes and challenges of managing cultural data quality. On the first day of the event, stakeholders from research data management and research presented their strategies and experiences. On the second afternoon, these practical reports were followed by presentations on the NFDI4Culture approach.
Celia Krause (NFDI4Culture, DDK) started by providing a thematic introduction, raising fundamental questions: What, how, and why should be measured, and using what? After presenting the dimensions of data quality, Krause examined the core areas of data quality management. The overview of existing general approaches to deriving metrics, to frameworks (FAIR), and tiered models for implementation laid the basis for the following contributions.
As Europe's largest aggregator of cultural data, Europeana plays a pivotal role in developing the Common European Data Space for Cultural Heritage. Henning Scholz (Europeana Foundation) discussed the portal's Publishing Framework, which allows for different levels of data quality in the form of a tiered model, but also establishes a minimum standard. Scholz also presented the Metis Sandbox, a test environment in which data providers can assess the quality of their data before delivery. In the future, data quality assessments will increasingly involve information that allows users to determine if the data is suitable for their needs.
Under the umbrella of the European Open Science Cloud (EOSC), research data from all domains, including the NFDI, will be brought together productively. In his presentation, Chris Schubert (EULiST, TU Vienna) emphasized the importance of a common understanding of data quality throughout the entire data life cycle. While the EOSC Data Quality Framework is a valuable resource, concrete implementation regulations still have to be specified due to the lack of persistent governance structures. Additionally, Schubert identified the challenge of ensuring trustworthy data environments in light of current developments in artificial intelligence.
Johannes Schäffer and Magdalene Schlösser (Helmholtz Center for Cultural Technology, IfM, both NFDI4Objects) presented a field analysis of the state of data management, conceptions of data quality, and the implementation of the FAIR principles in museums and collections in Germany. Due to the diversity of institutions, a complex picture emerges regarding the use of controlled vocabularies and the application of the FAIR Principles. The analysis enables us to assess the situation of an entire sector characterised by heterogeneous practices and the absence of central guidelines, which makes it representative of our communities. Although awareness of FAIR practices and the use of standard vocabularies, common metadata formats, and viable software is increasing, the large number of in-house solutions remain challenging to connect to FAIR data practices. Here, collections that join networks and associations are significantly more successful.
Then, Anke Hofmann and Elisa Klar (both from the Library and Archive of the HMT Leipzig) provided a report on their research project. They presented CARLA, which is a database based on an analysis of study documents of persons associated with the Leipzig Conservatory between 1843 and 1918. They primarily presented their measures to ensure data quality, such as listing the database in relevant registries, linking via interfaces, using authority data, indicating licenses, and the planned expansion stages. This concluded the first day with an instructive perspective.
The second afternoon was devoted NFDI4Culture’s approach to measuring data quality in repositories and data platforms provided by the consortium's partner institutions. After Desiree Mayer (NFDI4Culture, SLUB Dresden) had given a summary of the first day of the event, Angela Kailus (NFDI4Culture, DDK) presented the current status of the implementation of data quality measurement. It was developed on the basis of an in-depth analysis of the data landscape of the five represented consortium domains, which currently includes 79 data services. The analysis revealed an extraordinary range of subject areas, data types and formats, standards used, and degrees of FAIR implementation. We therefore started developing a detailed catalogue of FAIR criteria, enabling us to categorise services based on recognised tiered models, in line with the FAIR Maturity Model or the Europeana approach. Implementing the model at the macro level reveals a wide range of results: While awareness of FAIR is widespread, there is still a lot of work to be done. Measurement criteria and methods must be further differentiated.
Linnaea Söhn (NFDI4Culture, AdW Mainz) provided an in-depth look at the technical foundations for embedding data quality criteria and their assessment in NFDI4Culture. On the one hand, the Culture Knowledge Graph comprises the NFDI4Culture portal's research information in structured form and fully implements FAIR. However, it also increasingly incorporates metadata on the holdings of repositories and data platforms. This enables comprehensive, research-driven queries of the data holdings. Torsten Schrade (NFDI4Culture, AdW Mainz) presented the Culture Knowledge Graph as a data analysis tool with the “Italian data journey” as an example. Using the Partitura project (DHI Rome), in which numerous opera scores were digitized, he demonstrated how data producers can optimize their data quality for federated data analysis and effectively increase its knowledge potential.
The event concluded with outlooks from Melanie Gruß (NFDI4Culture, SLUB Dresden) and Angela Kailus. First, they summarised the status achieved by NFDI4Culture with regard to supporting the community in implementing the FAIR Principles and measuring data quality. Then, they outlined the planned expansion and refinement of data collection with regard to quality criteria, as well as improved analysis methods in the form of automated evaluation procedures. The results will also form the basis for a more specific, micro-level assessment of the quality of individual data holdings. This will enable the initiation of targeted curation measures and ensure sustainable quality of cultural data at its source. Consequently, measuring data quality becomes a diagnostic tool providing transparent results that benefit data providers and researchers alike. It is crucial that concepts for measuring and ensuring data quality remain straightforward and accessible.
Overall, the feedback on the NFDI4Culture approach was positive. The greatest challenges to ensuring data quality were the various recording practices, redundancies in holdings, and insufficient identifier persistence. Participants had practical questions about improving their data quality, specifically and actively implementing the consortium's tools to support quality management, and the overall functionality of the Culture Knowledge Graph. It was suggested that demo videos be made available to all communities. The consistently high number of participants on both days reflected the relevance of the topic: 175 on the first day and 127 on the second. We would like to thank all the speakers and participants for their contributions.
The presentations are available for download here.