AI Unleashed: Testing (Semi-)Automatic Metadata against Human Labeling on Handwritten Sources in City-state Bern (1528-1795)

  • Romein, A. (Speaker)
  • Sara Floor Veldhoen (Speaker)
  • Andreas Wagner (Speaker)
  • J.C. Romein (Speaker)

Activity: Talk or presentationAcademic

Description

AI Unleashed: Testing Automatic Metadata against Human Labeling on Handwritten Sources in City-state Bern (1528-1795)
Early modern governments enacted and published numerous norms (legislation) over time. In the 1990s, Michael Stolleis and Karl Härter initiated a significant project at the (then) Max-Planck-Institute for European Legal History (now the Max Planck Institute for Legal History and Legal Theory). Their ambitious endeavour aimed to provide an exhaustive overview of normative texts from 68 jurisdictions within Central Europe, resulting in the creation of the "Repertorium der Policeyordnungen der Frühen Neuzeit" series and a comprehensive database containing approximately 200,000 normative texts. These texts were systematically organized into hierarchical categories, facilitating effective cross-jurisdictional comparisons by researchers.
One jurisdiction extensively studied in this project was the 'city-state' of Berne, where 4932 normative texts (Mandaten) were published from 1528 until 1795 and recorded in the Book of Mandates. Dr. Claudia Schott-Volm provided a comprehensive overview and labelling of the Berne texts in 2006, offering valuable insights into both the normative texts and their corresponding jurisdictions. It is important to note that while Berne was officially designated as a 'city-state,' its territorial dominion extended to cover most of the modern-day canton of Berne, Aargau, Vaud, and certain areas of other cantons.
The main focus of the current study is to assess the feasibility of automatically labeling these legal texts using Artificial Intelligence (AI). To achieve this, the original manual labels serve as benchmarked data. Legal texts are renowned for their clear and precise phrasing, making them ideal candidates for AI-based classification. In this endeavor, the Annif tool, developed by the National Library of Finland, will be employed. The effectiveness of various backends, such as TF-IDF, Omikujji, and fastText, will be evaluated to determine the optimal approach for categorizing early modern handwritten texts.
The research methodology involves several crucial steps in training the AI model. The Mandates from the entire period are digitized, and Handwritten Text Recognition (Transkribus) is applied to create transcriptions. To ensure coherence and consistency, a Simple Knowledge Organisation System (SKOS) is utilized to reconcile the labels of the normative texts. The AI model within Annif is then trained and tested using a train/test/validation split on the Mandaten data. However, the multilayered nature of the SKOS, with 1800 categories distributed over four distinct levels, may present challenges, potentially affecting the accuracy of Annif, particularly for the lowest and most specific topics. As such, recommendations will be provided on how Annif can be effectively utilized in conjunction with handwritten material, considering the quality of Handwritten Text Recognition.
In conclusion, the project initiated by Stolleis and Härter has yielded invaluable insights into the vast realm of normative texts within Central Europe. The subsequent testing of AI-based classification using Annif holds the potential to revolutionize the analysis and categorization of legal texts. As technology continues to advance, the possibility of AI autonomously labelling legal texts represents a significant leap forward in legal historical research. It promises to enhance access to historical legal knowledge, enrich our understanding of early modern governance, and offer efficiency and accuracy in handling large volumes of legal documents for legal professionals and policymakers.
The results and findings from the initial testing of the AI model will be discussed, highlighting any notable patterns, trends, or challenges encountered during the experimentation process. The implications of the research will be assessed within the context of legal history research and AI applications in other disciplines. This abstract offers a comprehensive research project overview, emphasizing its objectives, methodology, and potential impact. By bridging the domains of legal history and AI, this study contributes to the advancement of both fields and opens up new avenues for exploring historical archives and generating fresh insights into the past. The successful development of an AI model capable of autonomously labelling legal texts holds transformative potential, streamlining research and facilitating access to historical legal information for a wider audience. As the project progresses, the implications and contributions of this research will continue to evolve and shape the future of legal scholarship and practice.
Bibliography
• Härter, Karl and M. Stolleis, https://www.lhlt.mpg.de/forschungsprojekt/repertorium-der-policeyordnungen?c=2124983 [Accessed: 1 August 2023]
• Schott-Volm, Claudia (2006): Repertorium der Policeyordnungen der Frühen Neuzeit (#7). Orte der Schweizer Eidgenossenschaft: Bern und Zürich. Frankfurt am Main: Klostermann.AI Unleashed: Testing Automatic Metadata against Human Labeling on Handwritten Sources in City-state Bern (1528-1795)
Period07 Dec 2023
Event titleComputational Humanities Research 2023
Event typeConference
Conference number4
LocationParis, FranceShow on map
Degree of RecognitionInternational

Keywords

  • Annif
  • Semi-automatic metadata
  • Bern
  • Police Ordinances