Evaluating State‐of‐the‐Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation

Gundram Leifert, Christel Annemieke Romein*, Achim Rabus, Phillip Benjamin Ströbel, Benjamin Kiessling, Tobias Hödel

*Bijbehorende auteur voor dit werk

Onderzoeksoutput: Bijdrage aan conferentiePosterWetenschappelijk

Samenvatting

Handwritten Text Recognition engines play a pivotal role in transforming historical handwritten documents into machine‐readable formats, facilitating efficient access and analysis of invaluable information. We present a comprehensive evaluation of cutting‐edge Handwritten Text Recognition engines, including PyLaia, IDA, and TrOCR. The study aims to assist researchers in making informed decisions for datafication of historical documents. We ensure a fair comparison of the engine performance by utilizing diverse datasets comprising varying languages, scripts, complexities, and historical periods. As a benchmark for improvement, we incorporate datasets previously processed with Transkribus/CitLab HTR+, allowing us to assess the engines' recognition accuracy. By analysing the Character Error Rate (CER), we evaluate their performance.
Originele taal-2Engels
DOI's
StatusGepubliceerd - 07 dec. 2023

Vingerafdruk

Duik in de onderzoeksthema's van 'Evaluating State‐of‐the‐Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation'. Samen vormen ze een unieke vingerafdruk.

Citeer dit