Brief Report | 05. August 2021

2nd Wikibase Workshop

By Lucia Sohmen, M. A. and Dr. Lozana Rossenova

Logo of the Wikibase Software

Wikibase logo

"Logo of the Wikibase Software" GNU General Public License 2.0

On July 29, 2021, FIZ Karlsruhe and TIB Hannover invited researchers and data managers from NFDIs to participate in the 2nd Wikibase Workshop. The workshop first featured three presentations about current applications of Wikibase. The second half of the workshop was focused on an interactive discussion with all participants on the requirements for installation and maintenance of Wikibase instances for research data projects, ontology modeling for more semantic expressivity within Wikibase as well as possibilities to upload bulk data.

Invited Talks

RaiseWikibase: Towards fast data import into Wikibase

Dr Renat Shigapov works in Mannheim University Library on the project "Business and Economics Research Data Center Baden Württemberg", which is evolving into BERD@NFDI. He works on creating a knowledge graph of German companies using Wikibase. In this presentation, Dr Shigapov gave an overview of the current state of data upload tools and performance within the Wikibase environment, and introduced RaiseWikibase, a new a Python tool for fast data import into Wikibase, developed within the context of the BERD project.

Slides: https://madoc.bib.uni-mannheim.de/60059/1/29.07.2021-RaiseWikibase-Shigapov.pdf

Further links:

BERD@BW project page: www.berd-bw.de

BERD@NFDI project page: www.berd-nfdi.de

RaiseWikibase GitHub repo: github.com/UB-Mannheim/RaiseWikibase

Paper “RaiseWikibase: Fast inserts into the BERD instance”: doi.org/10.1007/978-3-030-80418-3_11

GND meets Wikibase

Thomas Bauer, a data scientist in the Office for Library Standards at DNB (Deutsche National Bibliothek(DE) / German National Library (EN)), presented the ongoing work at DNB towards developing a Wikibase instance for the GND (Gemeinsame Normdatei). The goal for this development is to offer an appealing environment for non-librarian contributors to the GND in order to increase diversity, quality, quantity and richness of its data. There was already a lightning talk about the GND at the last Wikibase workshop, but this presentation focused on a specific use case scenario currently in development (Overview pages) and showed some results as well as hurdles.

Slides: cloud.nfdi4culture.de/s/PZombPCH8E5jt7j

Further links:

GND on twitter @gndnet

www.wikimedia.de/projects/wikilibrary-manifest/

The ArtBase Wikibase and updated query service

Dragan Espenschied is the Digital Preservation Director at Rhizome, an international digital arts organisation based in New York. Rhizome have been using Wikibase as the primary knowledge management software for their archive of born-digital art, the ArtBase, since 2016. Espenschied presented some of the specific customization work done on the ArtBase Wikibase to facilitate custom frontend presentation for art objects within MediaWiki pages while drawing on Wikibase Item data. He also showed the custom branded Wikibase Query Service interface that Rhizome released recently as a forked repository on Github.

Links from the presentation:

Rhizome’s ArtBase: https://artbase.rhizome.org/wiki/Main_Page

Example artwork page: https://artbase.rhizome.org/wiki/Q2508

Entity module template for constructing the artwork pages: https://artbase.rhizome.org/wiki/Module:entity

The ArtBase Query Service: https://query.artbase.rhizome.org/

GitHub repofor the Query Service: https://github.com/rhizomedotorg/artbase-query-gui/

Discussion Session

WikiBase Installation

The Wikibase Installation breakout session was moderated by Bayan Hilles (Partnerships team) and Thomas Arrow (Development team) from Wikimedia Germany, with significant contributions by Dragan Espenschied from Rhizome. The session focused on discussing challenges with the current options for configuring and deploying Wikibase on institutional infrastructure.

Some of the key issues identified already by some NFDI partners included institutional deployment and security policies, which in some places prevent use of the Docker distribution of Wikibase. This in turn leads to additional complications since installing the non-containerized version of Wikibase requires a range of configurations which remain poorly documented.

Requirements for better documentation around various installation and configuration steps were some of the key take-aways from this session. There were also discussions around the need to define custom configuration settings in multiple, often undocumented, places. It can also happen that the same information has to be given multiple times, in varying formats. This led to the formulation of a requirement for the need of a single point of configuration to set all custom variables needed in an institutional Wikibase, such as concept URIs, namespace prefixes, etc.

Lastly, for institutional deployments it is crucial to be able to maintain distinct testing and production environments, which leads to the requirement to be able to port a fully equipped Wikibase between developers' laptops and different online hosts, under different access URLs, with all features remaining intact. This remains a big challenge for the time being and contributes to confusion about the production-readiness of some Wikibase features.

Data Upload

The Data Upload breakout session was moderated by Lozana Rossenova and Lucia Sohmen from TIB’s Open Science Lab, with significant contributions by Renat Shigapov from Mannheim University Library. The session focused primarily on challenges with and requirements for bulk data upload, which is a significant part of the workflow of any institution intending to use Wikibase as a knowledge management tool.

The session outlined the main tools currently adopted by Wikibase users to prepare, reconcile and upload data to Wikibase. These included: OpenRefine, Quickstatements, Python Scripts, and RaiseWikibase. Participants reported errors with Quickstatements, and the preference to use OpenRefine in many cases, prompting a discussion around the need to add a reconciliation service that can be used by OpenRefine to the default Wikibase distribution.

When it came to validation or conformance checks, the majority of users either don’t have set tools and processes in place, or use a mix of manual approaches, maintenance scripts, and tools like Reasonator and Scholia. A key takeaway was that validation tools should be added as a default to Wikibase, and furthermore that validation should be possible against a specified schema – with the integration of shape expressions across various Wikibase services being an important aspect of this.

The rest of the session focused on discussing two priority areas for NFDI partners:1) the need to facilitate upload of ontologies including properties and class relations, and the lack of meaningful examples or best practices for mapping and uploading standard ontologies to Wikibase; and 2) the need to improve the performance of the Wikibase API, which at the moment is not sufficiently optimized for very large data set upload operations. The latter has been a point of heated debate in the Wikibase user community recently, leading to the creation of a dedicated Phabricator ticket, and a blog post by Wikibase & Wikidata Tech Lead, Adam Shorland.

Ontology Modeling

The third breakout session discussed requirements for Wikibase from the ontology modeling perspective. It was moderated by Oleksandra Bruns, Tabea Tietz, Jonas Oppenländer and Harald Sack of the ‘Information Service Engineering’ research group at FIZ Karlsruhe. Participants came from diverse domains, including digital humanities, social science, materials science and archaeology. Most participants were either knowledge engineers, developers or project managers.

The session first outlined currently used tools for ontology modeling, the challenges that appear in the process, as well as features needed to improve ontology modeling for RDM (Research Data Management) in the future. One of the main challenges discussed was the need to establishm common strategies and processes among knowledge engineers and domain experts. Furthermore, the participants expressed the challenges of finding suitable ontologies for their domain of interest. Better possibilities for collaborative ontology modeling was raised as one of the features needed most within existing tools, along with the possibility to interconnect knowledge sources more efficiently.

After the more general discussion around ontology modeling, the breakout session then focused on features required specifically for Wikibase. The most pressing requirement for Wikibase raised in the session was the creation of  a fully W3C- compliant Wikibase environment capable of importing existing external vocabularies with explicit and formal semantics that support federation and reasoning. Furthermore, the need for customizable and configurable forms for data input was raised, along with RDF-star support for reification and improved access and rights control.

_

The teams at FIZ Karlsruhe and TIB will continue to analyse in detail the individual challenges and requirements raised throughout all sessions of the workshop over the coming weeks , and will collaborate closely with Wikimedia Germany in defining which of these requirements can be turned into actionable tickets. The result of this analysis and collaboration will be a white paper to be published Open Access on the NFDI4Culture portal, outlining in more detail the specific requirements for the institutional use and future development of Wikibase from the point of view of the 4Culture consortium.