The Semantics of Structure in Large Historical Corpora

Research output: Contribution to conferencePaperScientificpeer-review

Abstract

Structuring large historical corpora that are too big to be processed manually can take two approaches. The first is an inductive method extracting implicit entities and meaning from textual (and sometimes visual) content. With the help of AI or manually compiled (existing) lists of entities, the entities are converted into information. The second, that Colavizza (2019) calls referential information systems, takes existing reference systems (like archival indexes) and uses them to contextualize individual documents. Both methods are used to turn corpora into computer accessible information systems. Ideally a more complete information system would result from combining both approaches, but in practice they are hard to bridge because of a number of different problems. This paper presents an approach that addresses those problems and combines inductive methods of automated text analysis and information extraction techniques with knowledge of the referential information systems to add rich semantic layers of information to large historical corpora.
Original languageEnglish
Publication statusPublished - 2020

Fingerprint Dive into the research topics of 'The Semantics of Structure in Large Historical Corpora'. Together they form a unique fingerprint.

Cite this