Experiences and lessons learned in publishing a large dataset: Detailed tables from the Population and Occupational Censuses 1947

Research output: Contribution to journal/periodicalArticleScientificpeer-review

2 Downloads (Pure)

Abstract

Since the end of the nineties Dutch census publications have been digitized and made available for digital processing. New analysis of the data has been presented in some fruitful conferences. Besides the census publications a mass of detailed census data is found in dossiers and sets of worksheets in the archive of Statistics Netherlands. Most of that material has been scanned into digital images. The detailed data of the Population Census 1947 is the first set of detailed data that is made available in digitally processible form. The present article describes the extensive steps of preparation of the dataset obtained. Special attention is paid to the aspects of preparing a dataset with a very large number of files, the organization of the dataset and the way of documenting the process. This delivered a systematic and reproducible method to prepare such a large dataset. Presenting the data in the preferred format of CSV-text files appears to give ample opportunities for further analysis.
Original languageEnglish
JournalResearch Data Journal for the Humanities and Social Sciences
Publication statusSubmitted - 30 Nov 2020

Keywords

  • large dataset - census data - Netherlands - OCR - data-entry - versioning - documentation method – preferred format - CSV-text files

Fingerprint Dive into the research topics of 'Experiences and lessons learned in publishing a large dataset: Detailed tables from the Population and Occupational Censuses 1947'. Together they form a unique fingerprint.

Cite this