Page Embeddings: Extracting and Classifying Historical Documents with Generic Vector Representations

Carsten Schnober*, Renate Smit, Manjusha Kuruppath, Kay Pepping, Leon van Wissen, Lodewijk Petram

*Bijbehorende auteur voor dit werk

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

Samenvatting

We propose a neural network architecture designed to generate region and page embeddings for boundary detection and classification of documents within a large and heterogeneous historical archive. Our approach is versatile and can be applied to other tasks and datasets. This method enhances the accessibility of historical archives and promotes a more inclusive utilization of historical materials.
Originele taal-2Engels
TitelProceedings of the Computational Humanities Research Conference 2024
SubtitelAarhus, Denmark, December 4-6, 2024
Pagina's999-1011
Aantal pagina's13
Volume3834
StatusGepubliceerd - 18 nov. 2024

Publicatie series

NaamCEUR Workshop Proceedings
UitgeverijCEUR Workshop Proceedings
ISSN van geprinte versie1613-0073

Citeer dit