Titelillustration Diskussionspanel am 4. Culture Community Plenary

Titelillustration Diskussionspanel am 4. Culture Community Plenary

CC0 Creator: Sarah Pittroff

For the 4th Culture Community Plenary, we organised the panel discussion "Is this meta or can it go away? Which data (from your discipline) belongs in the knowledge graph?" The panel on 07.06.2024: Thomas Koprucki, WIAS Berlin (MaRDI), Sarah Ondraszek, FIZ Karlsruhe, (NFDI4Memory), Harald Sack, FIZ Karlsruhe (NFDI4Culture), Harry Enke, AIP (PUNCH4NFDI), Jürgen Kett, DNB (Text+), Dirk Wintergrün, Klassik Stiftung Weimar (NFDI4Objects) und Karsten Ehms, Gesellschaft für Wissensmanagement e.V..

What data belongs in the knowledge graph?

What form of entities do we see?

From a humanities perspective, the question of content for knowledge graphs often begins at the technical level - for example, which standard data or vocabularies and ontologies are particularly suitable. The comparison of the modeling of different graphs between the consortia shows right at the beginning that a structural question must be asked much more fundamentally: Which entities do we model in knowledge graphs? In the entirety of the representations, we see modeled entities that can be divided into three categories. Nodes can be:

  • Data
  • Models
  • Rules

While NFDI4Objects and NFDI4Culture are mainly concerned with the first category and the problem lies in the semantic connection of the data, MaRDI currently models different content in three different knowledge graphs. The consortium is sure to receive attentive support from all those involved in the process of merging these graphs in the future!

What is primary data, what is secondary data?

Almost simultaneously, the question arises: What is so-called data? While it is primary data for measurements, this often does not apply to the humanities: digitised and historical data are metadata, secondary data that exist as enrichments of a digital twin to an original work. It is then established for knowledge graphs:

Primary data are the cross-connections - in the public domain.

This is because the knowledge value lies explicitly in the edges of the graphs. These semantic connections become primary data in their own right in the knowledge graph. Unlike their big brother Google, they are in the public domain and are always quality-assured. A lot of time, energy and therefore money goes into this structuring. The cost-/benefit calculation follows immediately:

Expressiveness and structure

Karsten Ehms emphasises the two poles of differently structured data: the general desire for (research) data is for it to be expressive. The more precisely modelled, the more structured and less free the handling of data input and output becomes. "And who wants to fill out forms all day?" is a question that is sure to provoke only a few raised hands among the scientific audience.

Jürgen Kett counters that the actual nodes of a knowledge graph are only part of the main interest. The primary data are the cross-connections between the nodes, and this is also in the public domain!

Who benefits from knowledge graphs and who doesn't?

Whether the enormous effort to build a well-formed knowledge graph is worthwhile at all - and that it is necessary to build a well-formed knowledge graph was something that all panelists and the audience had agreed on without exception up to that point - was called into question, so to speak, as an affirmative turn in the middle of the panel by Harry Enke's contribution. For the data genesis of astrophysics, which takes place via numerous widely dispersed actors and sometimes over a long period of time (historians turn pale when their date range of tender thousands of years shrinks in the light of galactic distances, which must be overcome in the first step for the transmission alone), he states: The primary data obtained is first processed into a meaningful structure by the respective research institutions and then corresponds to a particular research interest. The reusability of such data is low, their semantic self-disclosure shallow and therefore not interesting in the research community. "The universe cannot be modelled conclusively," says Enke. 

Karsten Ehms critically questions the benefits of knowledge graphs from the user's perspective. His practical experience in large and complex companies shows that the effort involved in creating structured data greatly reduces the willingness to make it available in the first place. In the long term, the permanent sharing of information works much better in very flat and low-threshold systems such as a wiki, in which knowledge can be stored in a natural language, intuitively without training. 

Data publication is a scientific achievement

The advice to continue development with a keen eye on the usability has been heard. The fact that participation in the construction of the graph is not and does not have to be done on the side has been emphasised and should be linked to a further discussion: Data publications need scientific credit. They are top performances. 

 

Presentations of the panel

 Sarah Ondraszek, NFDI4Memory

Harry Enke, PUNCH4NFDI

Jürgen Kett, Text+

Dirk Wintergrün, NFDI4Objects

 

(This article was written in german language and translated with AI.)