Toward an ontology for archival resources. Modelling persons, objects and places in the Golden Agents research infrastructure,

Activiteit: Toespraak of presentatieAcademisch


The project Golden Agents: Creative Industries and the Making of the Dutch Golden Age analyses interactions between the various branches of cultural industries, and the production and consumption of cultural goods in Amsterdam during the Dutch Golden Age. For the latter, it opens up the contents of Amsterdam City Archives’ extensive collections of notarial deeds, baptism, marriage, and burial registries, which provide insight into the households of Amsterdamers during the seventeenth and eighteenth century.

There are, however, some challenges in achieving this. The first challenge lies on a conceptual level, and asks how to construct (partial) storylines about the people and (type of) objects they possessed or traded (Zamborlini and Betti 2017; Van den Heuvel and Zamborlini 2019; Van den Heuvel and Zamborlini forthcoming). The second challenge is on a data level, and questions how we can decide when names or objects referenced in several documents are actually the same, namely disambiguations of mentions (Idrissou et al., 2019). In this paper we will first present a core part of the Reconstructions and Observations in Archival Resources model (ROAR++) as a first base of a series of ontological and data models to structure the contents of the archival resources with all its uncertainties for the infrastructure of the Golden Agents project. This presentation will be followed by a brief preview of an extensive publication of the ontology model: storylines of historical evidence and with an example of its relation with the ROAR++ model.


The ROAR++ model (Van Wissen et al., 2020) is being developed as an extension to ROAR (Den Engelse and Van Wissen, 2019). It relies on two abstract classes, namely that of Observation, as entities in a context, and Reconstruction, as combining several Observations of one entity. In addition, it provides a way to guide provenance documentation of archival resources by combining existing common vocabularies (e.g. PROV). We build on top of this model by (i) proposing complimentary layers that can guide provenance to suit different requirements, and (ii) further specifying the notion of Observation. Finally, through Reconstructions, ROAR++ aligns with the issue of disambiguation while both notions align well with the original idea of the construction of storylines.

The ROAR++ model comprehends the following abstraction/interpretation layers:

A document or collection layer, which contains the data on the (sub)collections and taxonomy of the archive. This can be compared to information originally displayed by the archive in an EAD-file [Encoded Archival Description].
An Observation layer
a. Content: covers annotations of entities in an archival document, without additional interpretation.
b. Direct interpretation: covers the events and roles that are explicitly mentioned in a record, such as a baptism event with witnesses.
c. Indirect interpretation: includes inferred events, such as the birth of the baptised child.
A Reconstruction layer, which spans multiple records and connects, where possible, the individual entities that were mentioned, interpreted or inferred.
Each of those layers can be described with little or detailed provenance. For example, one can simply state that an observation is in a certain document, or can point to the text on which an observation is based. Currently, the full model to address all the proposed layers is still under development. Hereby we focus on the concepts supporting layers 2 and 3 (see Figure 1).

Figure 1 – Core elements in the ROAR++ model
An Observation regards entities such as Events or Roles as mentioned in a specific Document. Bearer entities (e.g. objects, agents, places) play or bear specific Roles (e.g. inventoried goods, witness, home location), carried in the context of an Event (e.g. auction or baptism event). The advantage of modelling entities as Role, is that they can be expressed with their contingent qualities that are only valid in the specific point in time (e.g. someone’s age or status).

Reconstructions are based on two or more Observations within or across documents, implying that two or more mentions refer to the same referent. This process is called disambiguation and can be achieved by instantiating an identity relation such as owl:sameAs, or a more complex reified equality relation that can be qualified with information on provenance, probability, validation by human expert, and other relevant properties (Idrissou et al., 2019). Moreover, the outcome of such a process is stored in a format that allows for changes to the dataset when new information becomes available and that provides insight in the decisions taken in the disambiguation process. This also enables entity reconciliation internally and externally (e.g. with Getty’s ULAN).

Within the Golden Agents project, this approach is being applied to the extensive but disconnected collections provided by the Amsterdam City Archives, of which the notarial deeds are particularly relevant (Van Wissen et altri, 2020). The archive indexes person names and some location names, and provides unprocessed text for parts of their notarial registries (i.e. probate inventories) from which we can extract objects. However, entities that are mentioned multiple times within or between indices are not disambiguated, which is necessary in order to know who owned which cultural goods when and where in Amsterdam.
This faces us with the challenge to establish for enormous numbers of documents whether two or more mentions refer to the same entity (eg. person, object or place). Therefore, the Golden Agents project set up experiments with matching algorithms to disambiguate a.o. person names with the new Lenticular Lenses tool developed by Al Idrissou et al. (2018 and 2019). This tool allows users to validate and/or contest the linkages made in the process. For archival documents, one can store information on the physical location of the reference in the original document, the type of document, its creation history, and the trustworthiness or (un)certainty of the actors involved. This is complemented by background information on the context in which these references were uttered.
Storylines of Historical Evidence

ROAR++ makes part of the encompassing model called Storylines of Historical Evidence of which the first ideas were discussed at previous Data for History Meetings in Lyon, Galway, Leipzig, and the DH conference in Utrecht (Van den Heuvel and Zamborlini, 2019) in which all unstructured data derived from archival sources with information about persons, locations and cultural goods are organized to be linked with the structured metadata of the information about the creative industries of the Dutch Golden Age brought together in the Golden Agents infrastructure. The model has been worked out in a forthcoming publication that focuses on the modeling of time and historical processes inspired by George Kubler’s Shape of Time (1962). In this paper we briefly refer to parts of that ontological model that organizes different views on sources of historical evidence, such as archival sources, using references to works of Rembrandt in documents of the Amsterdam City Archives.
It will be argued that opening up the Amsterdam City Archives and expressing their collection, indices, and the content of the inventory books in a model that both captures provenance information, as well as facilitates the interpretation of the contents of the deeds is not only beneficial for the Golden Agents project. The proposed ontologies for ROAR++ and the Storylines of Historical Evidence, presented as UML diagrams, extend the Unified Foundational Ontology (UFO) (Guizzardi, 2015). They are provided with mappings to other existing ontologies such as PROV, FRBR, and CIDOC-CRM, and with a step-by-step guide in modelling choices. As such, the models can also become highly relevant for Digital Humanities research into historical data.
Periode13 jan. 2021
Mate van erkenningInternationaal