'Entangled Histories' and the use of Annif

Romein, A. (Speaker)

Activiteit: Toespraak of presentatieAcademisch


n the early modern period – rules were announced by the city crier. He walked through the city, or rode a horse, to visit indicated locations to read new rules to the inhabitants. After reading them aloud, the printed texts were fixed to “well-known places” (e.g. church doors, at the market square) for people to be able to reread them. An estimated 70% of people could read in the Republic, so for the remainder of the inhabitants having the new rules read aloud was still very important. The rules had to make sense so people could remember them by heart. Hence, there is a repetitiveness in the texts – which makes sense given that the 16th and 17th century had an important oral tradition.
Ordinances, or placards, were affixed to known places. This made them official. The provincial estates considered it to be important to print a selection of their agreed-upon texts in books of ordinances. These formed a source for lawyers as a reference work. These books do not form a complete overview, but they do give a sense of what government officials deemed important.
In order to study the potential differences, the first step in the project is to analyse the layout of the pages and enhance the readability of the early modern texts. As humans, we can read the texts fairly well, but gothic fonts prove to be more challenging than roman fonts. The computer does not mind, rather it considers it a challenge. Within the Google Books project an OCR (Optical Character Recognition) has been applied, but, when you attempt to copy the texts (to a notepad for example) it will render useless due to the numerous errors.
For the next phase of the research, we focused on one of these books to provide a proof of concept: Groot Gelders placaet-boeck, Volume 2. The transcripted text from this book was segmented into separate laws with a rule-based approach, the code of which can be found on Github. It used information on font size from ABBYY whenever available, together with keywords matching to common title words reaching over 95% accuracy for the segmentation.
The laws were labeled with subjects from a categorisation created by the German Max-Planck-Institute for European Legal History (MPIeR). The same categories have been applied internationally, in over 68 early modern European states. It is a four level deep hierarchical categorisation considering police legislation, with the five top categories ranging from Social Order and Religion, to Public Safety and Justice, Schooling, Economic Affairs and Infrastructure. The topics are quite distinguishable until at least level three. International laws were tagged with a separate topic outside of police legislation. With Annif, we have tried to have the computer automatically categorise these texts in accordance with the MPIeR-categorisation.
Periode02 apr 2020
Gehouden opKoninklijke Bibliotheek, Nederland
Mate van erkenningInternationaal