Evaluating State‐of‐the‐Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation

Gundram Leifert, Christel Annemieke Romein*, Achim Rabus, Phillip Benjamin Ströbel, Benjamin Kiessling, Tobias Hödel

*Corresponding author for this work

Research output: Contribution to conferencePosterScientific

Abstract

Handwritten Text Recognition engines play a pivotal role in transforming historical handwritten documents into machine‐readable formats, facilitating efficient access and analysis of invaluable information. We present a comprehensive evaluation of cutting‐edge Handwritten Text Recognition engines, including PyLaia, IDA, and TrOCR. The study aims to assist researchers in making informed decisions for datafication of historical documents. We ensure a fair comparison of the engine performance by utilizing diverse datasets comprising varying languages, scripts, complexities, and historical periods. As a benchmark for improvement, we incorporate datasets previously processed with Transkribus/CitLab HTR+, allowing us to assess the engines' recognition accuracy. By analysing the Character Error Rate (CER), we evaluate their performance.
Original languageEnglish
DOIs
Publication statusPublished - 07 Dec 2023

Keywords

  • LLMs
  • HTR

Fingerprint

Dive into the research topics of 'Evaluating State‐of‐the‐Art Handwritten Text Recognition (HTR) Engines; with Large Language Models (LLMs) for Historical Document Digitisation'. Together they form a unique fingerprint.

Cite this