TY - CONF
T1 - Computer Vision for Removing Blind Spots in a Migrant Registration System
AU - Hoekstra, F.G.
AU - van Faassen, Marijke
PY - 2021/3/26
Y1 - 2021/3/26
N2 - In this paper we discuss the extension of established methodology with computer vision to make it possible to almost literally see into the blind spots that using established methods on large serial collections only leave. We argue that this method overcomes the dangers of implicit selection that are commonly designated as ‘cherry picking’, or selecting the ‘most important’ files. Furthermore, in using combined traditional and DH-methodology it becomes visible what is and what is not in the collection as a whole. As such it is a a replacement for traditional leafing through an archive. As a method of source criticism, this gives many more possibilities than would have been possible with established methodologies only. We illustrate our findings on our project Migrant, Mobilities and Connection on Dutch-Australian emigration 1950-1992. The main research question in the project is which factors determined the whole migration experience and what the relation was between policy, civil society and individual agency. Australia was, together with Canada, the main destination of Dutch emigrants after 1945, receiving around 160,000 migrants from 1950 to 1992. The point of departure is a registration system that was kept by the Dutch migration authorities, based at the Dutch consulates. It consists of 50,000 cards (100,000 images scanned) and contains data about the interactions between migrants and the migration officers from 1950-1992. The cards themselves contain a wealth of information that is not readily available as the writing on the cards is a mixed of manuscript and typescript that are distributed unequally over the cards. Before starting to answer the main question we had to determine first how to study Dutch-Australian emigrants with this extensive registration system that is hermetic by its size and composition. Traditionally, historians would tackle a collection like this by taking a sample from the cards and additionally study the most interesting cases. However, case selection is difficult as it is impossible to read a hundred thousand images or even leaf through them. Moreover, it is not clear how cases fit into the registration system and whether there are hidden features of the system influencing the size of files. Thus, a combination of a large archive collection of mostly undifferentiated material and methodologies not devised for distant reading, leads to blind spots for the historian and asks for additional methods to inspect the whole collection. The computer vision method we adopted measured the amount of writing on the cards. Viewed over the whole registration system, this gives a distribution of the information over the cards.Combining this with traditional sampling we were able to identify distinguishable groups of migrants (eg. by as religion, marital status or age). In the paper we will elaborate on the (non-)possibilities of relating these groups to the ‘information distribution’ as a whole and on the (non-)possibilities of distinguishing changes in policies and executive strategies of the Dutch migration authorities by using this combined methods.
AB - In this paper we discuss the extension of established methodology with computer vision to make it possible to almost literally see into the blind spots that using established methods on large serial collections only leave. We argue that this method overcomes the dangers of implicit selection that are commonly designated as ‘cherry picking’, or selecting the ‘most important’ files. Furthermore, in using combined traditional and DH-methodology it becomes visible what is and what is not in the collection as a whole. As such it is a a replacement for traditional leafing through an archive. As a method of source criticism, this gives many more possibilities than would have been possible with established methodologies only. We illustrate our findings on our project Migrant, Mobilities and Connection on Dutch-Australian emigration 1950-1992. The main research question in the project is which factors determined the whole migration experience and what the relation was between policy, civil society and individual agency. Australia was, together with Canada, the main destination of Dutch emigrants after 1945, receiving around 160,000 migrants from 1950 to 1992. The point of departure is a registration system that was kept by the Dutch migration authorities, based at the Dutch consulates. It consists of 50,000 cards (100,000 images scanned) and contains data about the interactions between migrants and the migration officers from 1950-1992. The cards themselves contain a wealth of information that is not readily available as the writing on the cards is a mixed of manuscript and typescript that are distributed unequally over the cards. Before starting to answer the main question we had to determine first how to study Dutch-Australian emigrants with this extensive registration system that is hermetic by its size and composition. Traditionally, historians would tackle a collection like this by taking a sample from the cards and additionally study the most interesting cases. However, case selection is difficult as it is impossible to read a hundred thousand images or even leaf through them. Moreover, it is not clear how cases fit into the registration system and whether there are hidden features of the system influencing the size of files. Thus, a combination of a large archive collection of mostly undifferentiated material and methodologies not devised for distant reading, leads to blind spots for the historian and asks for additional methods to inspect the whole collection. The computer vision method we adopted measured the amount of writing on the cards. Viewed over the whole registration system, this gives a distribution of the information over the cards.Combining this with traditional sampling we were able to identify distinguishable groups of migrants (eg. by as religion, marital status or age). In the paper we will elaborate on the (non-)possibilities of relating these groups to the ‘information distribution’ as a whole and on the (non-)possibilities of distinguishing changes in policies and executive strategies of the Dutch migration authorities by using this combined methods.
M3 - Paper
ER -