Upcycling the Dutch civil registry using Linked Data

Activiteit: Toespraak of presentatieAcademisch


Civil certificates – registrations of birth, marriage, and death – contain a wealth of information regarding occupations, age at marriage, mortality, etcetera. This information is becoming available to scholars within the historical, life, and social sciences, as civil certificates have been scanned and manually transcribed by employees and volunteers from city, regional, and provincial archives in the Netherlands. A collaboration between the Central Bureau for Genealogy (CBG), the International Institute for Social History (IISH), and the CLARIAH project aims to upcycle the civil registry data by matching names of individuals appearing on multiple certificates to reconstitute families and life courses. The goal is to create a research-grade dataset in a FAIR format, with proper documentation and provenance information.

The project requires collaboration between socio-economic historians and computer scientists from multiple institutions. The CBG retrieves, merges, stores, and shares indexes of civil certificates from local archives. Currently, the IISH retrieves an Access to Archives (A2A) index from the CBG at irregular intervals. The civil certificate indexes are then converted into a MySQL database and cleaned before being converted into Linked Data (LD). Due to the size of the data and the complexity of the matching procedure, burgerLinker converts the LD to a HDT format followed by a scalable and fast matching procedure based on a Levenshtein automaton. The Civil Registries Reconstitutions Cleaner (C2RC) tool is being developed next to apply a set of rules to semi-automatically validate clusters of certificates in which a single individual appears.

In our paper we will show how Linked Data’s focus on formal descriptions of data structures and open licenses incentivized us to adopt a community metadata standard, develop open-source software, and share results.
Periode23 nov. 2023
Mate van erkenningInternationaal