Mining the Twentieth Century's History from the Time Magazine Corpus

Mike Kestemont, F.B. Karsdorp, Marten Düring

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

249 Downloads (Pure)

Samenvatting

In this paper we report on an explorative study of the history of the twentieth cen- tury from a lexical point of view. As data, we use a diachronic collection of 270,000+ English-language articles har- vested from the electronic archive of the well-known Time Magazine (1923–2006). We attempt to automatically identify significant shifts in the vocabulary used in this corpus using efficient, yet unsupervised computational methods, such as Parsimonious Language Models. We offer a qualitative interpretation of the outcome of our experiments in the light of momen- tous events in the twentieth century, such as the Second World War or the rise of the Internet. This paper follows up on a recent string of frequentist approaches to studying cultural history (‘Culturomics’), in which the evolution of human culture is studied from a quantitative perspective, on the basis of lexical statistics extracted from large, textual data sets.
Originele taal-2Engels
TitelProceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
UitgeverijAssociation for Computational Linguistics (ACL)
Pagina's62
Aantal pagina's70
ISBN van geprinte versie978-1-937284-85-5
StatusGepubliceerd - 2014

Vingerafdruk

Duik in de onderzoeksthema's van 'Mining the Twentieth Century's History from the Time Magazine Corpus'. Samen vormen ze een unieke vingerafdruk.

Citeer dit