Abstract
Many people are multilingual and they may draw from multiple language varieties
when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology |
Place of Publication | Stroudsburg |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 82-28 |
ISBN (Print) | 978-1-945626-08-1 |
Publication status | Published - 01 Aug 2016 |