Abstract
In this work, we discuss the benefits of using automatically parsed corpora to study language variation.The study of language variation is an area of linguistics in which quantitative methods have been particularly successful. We argue that the large datasets that can be obtained using automatic annotation can help
drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another
domain of text, using additional data.
drive further research in this direction, providing sufficient data for the increasingly complex models used to describe variation. We demonstrate this by replicating and extending a previous quantitative variation study that used manually and semi-automatically annotated data.
We show that while the study cannot be replicated completely due to limitations of the existing automatic annotation, we can draw at least the same conclusions as the original study. In addition, we demonstrate the flexibility of this method by extending the findings to related linguistic constructions and to another
domain of text, using additional data.
Original language | English |
---|---|
Title of host publication | Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers |
Editors | Junichi Tsujii, Jan Hajic |
Place of Publication | Dublin |
Publisher | Dublin City University and Association for Computational Linguistics |
Pages | 1974-1985 |
Publication status | Published - 2014 |