TY - CHAP
T1 - Constructing a Recipe Web from Historical Newspapers
AU - Wevers, Melvin
AU - van Erp, Marieke
AU - Huurdeman, Hugo
PY - 2018
Y1 - 2018
N2 - Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied, digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how combining natural language processing, machine learning, and semantic web can be used to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture.
AB - Historical newspapers provide a lens on customs and habits of the past. For example, recipes published in newspapers highlight what and how we ate and thought about food. The challenge here is that newspaper data is often unstructured and highly varied, digitised historical newspapers add an additional challenge, namely that of fluctuations in OCR quality. Therefore, it is difficult to locate and extract recipes from them. We present our approach based on distant supervision and automatically extracted lexicons to identify recipes in digitised historical newspapers, to generate recipe tags, and to extract ingredient information. We provide OCR quality indicators and their impact on the extraction process. We enrich the recipes with links to information on the ingredients. Our research shows how combining natural language processing, machine learning, and semantic web can be used to construct a rich dataset from heterogeneous newspapers for the historical analysis of food culture.
U2 - 10.1007/978-3-030-00671-6_13
DO - 10.1007/978-3-030-00671-6_13
M3 - Contribution to conference proceedings
BT - ISWC 2018 proceedings
PB - Springer
ER -