Recognising and linking entities in old dutch text: A case study on voc notary records

Barry Hendriks, Paul Groth, Marieke van Erp

Research output: Chapter in book/volumeContribution to conference proceedingsScientificpeer-review

3 Citations (Scopus)

Abstract

The increased availability of digitised historical archives allows researchers to discover detailed information about people and companies from the past. However, the unconnected nature of these datasets presents a non-trivial challenge. In this paper, we present an approach and experiments to recognise person names in digitised notary records and link them to their job registration in the Dutch East India company’s records. Our approach shows that standard state-of-the-art language models have difficulties dealing with 18th century texts. However a small amount of domain adaption can improve the connection of information on sailors from different archives.

Original languageEnglish
Title of host publicationCOLCO 2020 
Subtitle of host publicationCollect and Connect : Archives and Collections in a Digital Age 2020. Proceedings of the International Conference Collect and Connect: Archives and Collections in a Digital Age Leiden, the Netherlands, November 23-24, 2020.
PublisherCEUR Workshop Proceedings
Pages25-36
Number of pages12
Volume2810
Publication statusPublished - 2021

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR Workshop Proceedings
ISSN (Print)1613-0073

Keywords

  • Domain adaptation
  • Maritime history
  • Named entity recognition

Fingerprint

Dive into the research topics of 'Recognising and linking entities in old dutch text: A case study on voc notary records'. Together they form a unique fingerprint.

Cite this