Applying automatically parsed corpora to the study of language variation

Jelke Bloem, A.P. Versloot, F.P. Weerman

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review


In this work, we discuss the benefits of using automatically parsed corpora to study language variation.The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help
drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another
domain of text, using additional data.
Originele taal-2Engels
TitelProceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
RedacteurenJunichi Tsujii, Jan Hajic
Plaats van productieDublin
UitgeverijDublin City University and Association for Computational Linguistics
StatusGepubliceerd - 2014

Citeer dit