Making more of volumes. Dissecting and searching norms in Books of Ordinances (1500-1850s)

Romein, A. (Speaker)

Activity: Talk or presentationAcademic

Description

The project Entangled Histories used early modern printed normative texts. The computer used to have significant problems being able to read Dutch Gothic print or even Dutch Roman print, which is used in many of the sources. Using the Handwritten Text Recognition suite Transkribus (v.1.07-v.1.10), we reprocessed the original scans that had poor quality OCR, obtaining a Character Error Rate (CER) much lower than our initial expectations of <5% CER. This result is a significant improvement that enables the searching through 75,000 pages of printed normative texts from 108 books originating from the seventeen provinces. Each text (norm) in the books concerns one or more topics or categories. A selection of normative texts was manually labelled with internationally used (hierarchical) categories. Using Annif, a tool for automatic subject indexing, the computer was trained to apply the categories by itself. Automatic metadata makes it easier to search relevant texts and allows further analysis.
Period08 Dec 2020
Degree of RecognitionInternational

Keywords

  • digital humanities
  • transkribus
  • annif
  • automatic metadating
  • MPIER
  • ordinances