Applying automatically parsed corpora to the study of language variation

Jelke Bloem, A.P. Versloot, F.P. Weerman

Research output: Chapter in book/volumeContribution to conference proceedingsScientificpeer-review

Abstract

In this work, we discuss the benefits of using automatically parsed corpora to study language variation.The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help
drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another
domain of text, using additional data.
Original languageEnglish
Title of host publicationProceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
EditorsJunichi Tsujii, Jan Hajic
Place of PublicationDublin
PublisherDublin City University and Association for Computational Linguistics
Pages1974-1985
Publication statusPublished - 2014

Cite this