Abstract
We present novel software to process scans of historical documents to extract their layout information. We do this using a ResNet backbone with a feature pyramid head. We extract region information directly into PageXML. For baseline extraction, we use a two stage processing approach. The software has been applied successfully to several projects. The results show the feasibility to automatically label text lines and regions in historical documents.
Original language | English |
---|---|
Pages | 67-72 |
Number of pages | 6 |
Publication status | Published - 25 Aug 2023 |
Event | HIP '23: Proceedings of the 7th International Workshop on Historical Document Imaging and Processing - San José, California, San José, United States Duration: 25 Aug 2023 → 26 Aug 2023 https://dl.acm.org/doi/proceedings/10.1145/3604951 |
Conference
Conference | HIP '23 |
---|---|
Abbreviated title | HIP '23 |
Country/Territory | United States |
City | San José |
Period | 25/08/2023 → 26/08/2023 |
Internet address |
Keywords
- Datasets
- historical documents
- Neural networks
- layout analysis