TY - JOUR
T1 - The Value of Preexisting Structures for Digital Access
T2 - Modelling the Resolutions of the Dutch States General
AU - Koolen, Marijn
AU - Hoekstra, Rik
AU - Oddens, Joris
AU - Sluijter, Ronald
AU - Van Koert, Rutger
AU - Brouwer, Gijsjan
AU - Brugman, Hennie
N1 - Funding Information:
This research is funded by the Dutch Research Council (NWO) through the NWO Groot project REPUBLIC (REsolutions PUBLIshed in a Computational environment) 2019–2024.
Publisher Copyright:
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2023/6/1
Y1 - 2023/6/1
N2 - The Resolutions of the Dutch States General (1576–1796) is an archive covering over two centuries of decision making and consists of a heterogeneous series of handwritten and printed documents. The archive, which has recently been digitised, is a rich source for historical research. However, owing to the archive’s heterogeneity and dispersion of information, historians and other researchers find it hard to use the archive for their research. In this article, we describe how we deal with the challenges of structuring and connecting the information in this archive. We focus on identifying the existing structural elements, to turn the archive from a set of pages into a set of meeting dates and individual resolutions, with rich metadata for each resolution. To deal with the challenges of historical language change, spelling variation, and text recognition mistakes, we exploit the repetitive nature of the language of the resolutions and use fuzzy string searching to identify structural elements by the formulaic expressions that signal their boundaries. We also discuss and provide an analysis of the value of extracting different types of entities from the text and argue that the choice of which types of entities to focus on should be made based on how they support relevant research questions and methods. In the resolutions, we choose to prioritise person qualifications such as profession, legal status, or title, over person names. Qualifications allow users to select certain groups of people and to meaningfully combine with other layers of metadata, whereas person names lack contextual information to disambiguate them, making it unclear which and how many persons are referred to by selecting a specific person name. We show how our methodology results in a computational platform that allows users to explore and analyse the archive through many connected layers of metadata.
AB - The Resolutions of the Dutch States General (1576–1796) is an archive covering over two centuries of decision making and consists of a heterogeneous series of handwritten and printed documents. The archive, which has recently been digitised, is a rich source for historical research. However, owing to the archive’s heterogeneity and dispersion of information, historians and other researchers find it hard to use the archive for their research. In this article, we describe how we deal with the challenges of structuring and connecting the information in this archive. We focus on identifying the existing structural elements, to turn the archive from a set of pages into a set of meeting dates and individual resolutions, with rich metadata for each resolution. To deal with the challenges of historical language change, spelling variation, and text recognition mistakes, we exploit the repetitive nature of the language of the resolutions and use fuzzy string searching to identify structural elements by the formulaic expressions that signal their boundaries. We also discuss and provide an analysis of the value of extracting different types of entities from the text and argue that the choice of which types of entities to focus on should be made based on how they support relevant research questions and methods. In the resolutions, we choose to prioritise person qualifications such as profession, legal status, or title, over person names. Qualifications allow users to select certain groups of people and to meaningfully combine with other layers of metadata, whereas person names lack contextual information to disambiguate them, making it unclear which and how many persons are referred to by selecting a specific person name. We show how our methodology results in a computational platform that allows users to explore and analyse the archive through many connected layers of metadata.
KW - text recognition
KW - data modelling
KW - Information extraction
KW - digital history
U2 - 10.1145/3575864
DO - 10.1145/3575864
M3 - Article
SN - 1556-4673
VL - 16
SP - 1
EP - 24
JO - Journal of Computing and Cultural Heritage
JF - Journal of Computing and Cultural Heritage
IS - 1
M1 - 1
ER -