TY - CONF
T1 - Evaluating State‐of‐the‐Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation
AU - Leifert, Gundram
AU - Romein, Christel Annemieke
AU - Rabus, Achim
AU - Ströbel, Phillip Benjamin
AU - Kiessling, Benjamin
AU - Hödel, Tobias
PY - 2023/12/7
Y1 - 2023/12/7
N2 - Handwritten Text Recognition engines play a pivotal role in transforming historical handwritten documents into machine‐readable
formats, facilitating efficient access and analysis of invaluable information. We present a comprehensive evaluation of cutting‐edge Handwritten Text Recognition engines, including PyLaia, IDA, and TrOCR. The study aims to assist researchers in making informed decisions for datafication of historical documents. We ensure a fair comparison of the engine performance by utilizing diverse datasets comprising varying languages, scripts, complexities, and historical periods. As a benchmark for improvement, we incorporate datasets previously processed with Transkribus/CitLab HTR+, allowing us to assess the engines' recognition accuracy. By analysing the Character Error Rate (CER), we evaluate their performance.
AB - Handwritten Text Recognition engines play a pivotal role in transforming historical handwritten documents into machine‐readable
formats, facilitating efficient access and analysis of invaluable information. We present a comprehensive evaluation of cutting‐edge Handwritten Text Recognition engines, including PyLaia, IDA, and TrOCR. The study aims to assist researchers in making informed decisions for datafication of historical documents. We ensure a fair comparison of the engine performance by utilizing diverse datasets comprising varying languages, scripts, complexities, and historical periods. As a benchmark for improvement, we incorporate datasets previously processed with Transkribus/CitLab HTR+, allowing us to assess the engines' recognition accuracy. By analysing the Character Error Rate (CER), we evaluate their performance.
KW - LLMs
KW - HTR
U2 - 10.5281/ZENODO.8102666
DO - 10.5281/ZENODO.8102666
M3 - Poster
ER -