Since the end of the nineties Dutch census publications have been digitized and made available for digital processing. New analysis of the data has been presented in some fruitful conferences. Besides the census publications a mass of detailed census data is found in dossiers and sets of worksheets in the archive of Statistics Netherlands. Most of that material has been scanned into digital images. The detailed data of the Population Census 1947 is the first set of detailed data that is made available in digitally processible form. The present article describes the extensive steps of preparation of the dataset obtained. Special attention is paid to the aspects of preparing a dataset with a very large number of files, the organization of the dataset and the way of documenting the process. This delivered a systematic and reproducible method to prepare such a large dataset. Presenting the data in the preferred format of CSV-text files appears to give ample opportunities for further analysis.
|Journal||Research Data Journal for the Humanities and Social Sciences|
|Publication status||Submitted - 30 Nov 2020|
- large dataset - census data - Netherlands - OCR - data-entry - versioning - documentation method – preferred format - CSV-text files