The Semantics of Structure in Large Historical Corpora

Onderzoeksoutput: Bijdrage aan conferentiePaperWetenschappelijkpeer review


Structuring large historical corpora that are too big to be processed manually can take two approaches. The first is an inductive method extracting implicit entities and meaning from textual (and sometimes visual) content. With the help of AI or manually compiled (existing) lists of entities, the entities are converted into information. The second, that Colavizza (2019) calls referential information systems, takes existing reference systems (like archival indexes) and uses them to contextualize individual documents. Both methods are used to turn corpora into computer accessible information systems. Ideally a more complete information system would result from combining both approaches, but in practice they are hard to bridge because of a number of different problems. This paper presents an approach that addresses those problems and combines inductive methods of automated text analysis and information extraction techniques with knowledge of the referential information systems to add rich semantic layers of information to large historical corpora.
Originele taal-2Engels
StatusGepubliceerd - 2020
EvenementDigital Humanities 2020: intersections - Ottawa, Canada
Duur: 20 jul. 202025 jul. 2020
Congresnummer: 31


ConferentieDigital Humanities 2020
Verkorte titelDH2020
Internet adres


Duik in de onderzoeksthema's van 'The Semantics of Structure in Large Historical Corpora'. Samen vormen ze een unieke vingerafdruk.

Citeer dit