Workshop Series "Four Steps to more Sustainability. Data Quality Strategies for Museums and Collections"
Museums, archives and comparable scholarly collections face the challenge of continually reassessing the reusability of their existing data on their holdings that has been continuously curated over long periods of time. In addition to internal usage purposes such as collection management, loan transactions or exhibition planning, the data also have to be suitable for new and promising scenarios of digital re-use in contexts beyond the individual institution. In this capacity, they are increasingly perceived as research data that should be usable in accordance with the FAIR Data Principles. For these reasons, it is becoming important to understand data quality in terms of interoperability and sustainable usability of the data. This subsequently places new demands on systems and workflows, and not least on the data literacy of staff.
On this topic, the NFDI4Culture task area "Standards, Data Quality, Curation" (TA 2), together with the task area "Cultural Research Data Academy" (TA 6) and the Documentation Section of the German Museums Association, organised a four-part open online workshop series for cultural heritage collections focusing on maintaining and improving the quality of collection data. The aim of the events was to introduce digital managers and curators, as well as those involved in project and process management or data publication, to the various aspects of data quality and strategies for effective planning, implementation, monitoring and subsequent improvement of data quality, and to encourage them to initiate and strengthen relevant initiatives in their institutions.
The offer was well received: More than 220 people registered, with more than 100 attending each workshop. In order to better understand participants' starting points and motivations, a questionnaire was included with registration. Afterwards, they were able to evaluate each workshop by means of a feedback questionnaire, and to express further needs and suggestions. An Etherpad was also used extensively to help exchange ideas and information.
Workshop: "Making data quality tangible. Quality criteria and purposes of data use"
The first workshop (16 June 2023) started by presenting the different manifestations of data quality in general. After an introduction by Celia Krause (Task Area 2) on the principles of quality management, on the categories of data quality and on the most important concepts for ensuring quality through standards, guidelines (FAIR Principles), strategies, process management and operationalisation, the paradigm of "collections as data" from the Anglo-American area was explained. The following talk by Julia Rössel (Deutsches Dokumentationszentrum für Kunstgeschichte - Bildarchiv Foto Marburg) presented typical data quality problems that have been investigated in the KONDA project. She pointed out strategies for identifying, classifying and analysing these problems in order to develop appropriate solutions and illustrated this with an example. During the subsequent group work on the online whiteboard, the participants were encouraged to identify and classify quality problems in their institution and to develop target groups, products and key work steps for good data quality. This was done based on their prioritised re-use scenarios, such as the improvement of stock publication, the use by researchers, Linked Open Data, networking, publication on other platforms and improving the searchability of data.
Workshop: "Effective management for better data quality"
The second workshop (23 June 2023) focused on the use of data management plans (DMPs). DMPs can help to plan and run scientific projects, but they can also serve as effective planning and management tools to ensure the quality of collection data on an ongoing basis. In an introductory presentation, Celia Krause highlighted data quality as a management task at the three levels of institution, project and data. She also outlined the requirements that collection data must meet today and subsequently presented the data management plan as a tool. Noreen Klingspor (Württemberg State Museum) and Jana Hoffmann (Museum für Naturkunde Berlin) then described the practical implementation of successful data management in their institutions. Despite very different approaches, in both cases the maintenance of defined data quality levels is an integral part of digitally oriented future planning for the entire museum. Both presentations received many enquiries about the feasibility and implementation of various aspects of digital strategies through management practices. In the second part of the event, the topic was first explored in depth through a collaborative exercise to better understand data management plans. Participants then worked in small groups on the online whiteboard to develop such a plan based on a given task.
Workshop: "Setting the course for more quality: structuring object-related information, using controlled vocabularies"
In the third session (30 June 2023) the introductory talk by Angela Kailus (Task Area 2) presented how to set the course for sustainable collection documentation by designing an entity-relationship model in line with community standards. This was backed up by suggestions on how to structure the information to be recorded, how to deal with uncertain knowledge, and how to handle data provenance in accordance with good research practice. In a mapping exercise in several groups on the online whiteboard, the participants transferred existing object descriptions from different collections to the entity-relationship model presented earlier. After lively discussions further presentations followed: Angela Kailus gave advice on practical work with controlled vocabularies, terminology mapping and the efficient use of local thesaurus modules for linking to external terminologies. Chiara Marchini (German Digital Library, DDB) presented the mandatory data element requirements for data suppliers to the DDB and then showed how more extensive datasets compliant with the extended core field set of the updated DFG Practical Guidelines on Digitisation can be used more effectively in the DDB while supporting the FAIR data principles. A minimum dataset recommendation for museums will be developed in the coming months. Lukas Städing (digiCULT Verbund e. G.) pointed out the advantages of working in a networked cooperative. He presented various services of digiCULT that are available to the partners of the cooperative which support the individual institutions considerably, including software for collection management, vocabulary services, consulting, and data publication via various interfaces.
Workshop: "Subsequent quality improvement: Analysing, cleaning and enriching data"
The fourth workshop (7 July 2023) was dedicated to the subsequent modification of already existing data sets. The introductory presentation by Angela Kailus made it clear how insufficiently implemented quality in collection data can affect the institutional processes in the organisation and its range of services. The analysis starts with the work routines and experiences of the users. Based on this, data improvements can be planned and prioritised. Participants were particularly interested in the contribution of Hanna-Lena Meiners (DDK - Bildarchiv Foto Marburg, GND Pilotagentur Bauwerke), who presented OpenRefine, a powerful and flexible tool for data cleansing and enhancement, and gave a live demonstration of various cleansing methods. Michaela Grein (Übersee-Museum Bremen) reported on her museum's experience in an ambitious project to merge and make available its in-house data holdings, which was implemented using OpenRefine and integrated into the museum's continuous data quality management. This contribution was also well received. Joshua Enslin (Freies Deutsches Hochstift – Frankfurter Goethe-Haus) finally described the active data quality management of the museum-digital platform. It not only offers its partners specific support in producing better data, but also strengthens the exchangeability of the data (e.g. loan transactions with the LIDO application profile EODEM) and its reusability for research purposes through data enrichment, provision via APIs and LOD-based graph navigation.
The final discussion provided another opportunity for participants to evaluate the feasibility of the strategies and approaches to active data management presented and to share their own experiences. Many attendees found the workshops to be a valuable source of inspiration for further discussion of the topic, or were encouraged to take up the challenge of data quality in their own collection. In the exchanges, it became clear that some institutions are already well advanced in implementing their digital strategy, while others are still in the early stages. There is a strong willingness to share experiences. However, there is also a need for training and networking opportunities to support collections in the successful implementation of ongoing data quality management. The needs expressed provide a good starting point for NFDI4Culture to design such services or to refer to relevant third party services.
The presentations, etherpads, whiteboards of the workshops and a reading list with additional materials are available for download here.