News Item | 24. March 2023
Data Integration of the Corpus Vitrearum Germany into the Culture Knowledge Graph
Contributed by Jonatan Jalle Steller
Integrating research data from the German Corpus Vitrearum into the NFDI4Culture knowledge graph was as easy as adding a bit of markup to our website, and it came with the added benefit of making our content more discoverable. The markup is designed to not just link content across projects based on features like location, historical period, and motif, but also to make it more machine-readable. With just a few extra lines of markup, the content was automatically added to Google Dataset Search and several popular image searches.
The extra markup needed for the content graph can be added directly into a website’s template or, for example, via plugins that let you add schema.org markup. Like many other websites running content management systems such as TYPO3, WordPress, Drupal, or MediaWiki, the Corpus Vitrearum website uses a flexible templating engine. I decided to add the markup directly into the page displaying the list of all pictures since this part of the template contained all the data we wanted to add to the content graph anyway. Using the basic example from the documentation of NFDI4Culture's Culture Graph Interchange Format (CGIF), we only needed to write a bit of extra code for a single required field that was not yet available in our template: the date when this particular data feed was last updated. For everything else, we merely had to insert the right templating code into the sample code.
The content graph does not ingest all the data that participating projects could provide, but focuses on a limited set of data that is relatively universal and can thus be used to link together content across different projects. Apart from a resource’s URL, this includes its rough type (image, audio, video), its name, its historical period, and keywords from controlled vocabularies, which includes physical locations, associated people, and motifs. As the content graph grows, we can now use its querying interface (SPARQL) to provide related content from other research projects on our website.
This is particularly important for a project engaged in preserving pictures of and knowledge about Medieval and Early-Modern stained glass in Germany: website users may require more contextual information outside the project’s boundaries, such as how a specific motif compares to its depiction in other genres at the same time or how the German stained-glass tradition compares to other parts of the world, but this goal is nearly impossible to achieve without a curated and trustworthy linking tool such as the content graph.
As we worked with NFDI4Culture to provide sample content, we quickly managed to adapt the proposed format to additional use cases like search engine optimisation (SEO). As many search providers require the same schema.org markup as used in the CGIF, we added and documented a limited number of additional fields containing information to allow data to be ingested in further knowledge graphs, like the one built by Google. As a result, our website is not just listed in the academia-focused Dataset Search, but our images are now properly indexed into this commercial provider’s image search, which helps both professional and amateur users discover our project’s high-quality photographs.
We also presented our joint work with the NFDI4Culture teams "Knowledge Graph" and "Portal" at the DHd2023 conference. You can find the slides of the presentation with relevant links, illustrations and example SPARQL queries on Zenodo:
https://zenodo.org/record/7748740 (Knowledge Graph based research data integration in NFDI4Culture)