Many people are multilingual and they may draw from multiple language varieties
when writing their messages. This paper is a first step towards analyzing and detecting code-switching within words. We first segment words into smaller units. Then, words are identified that are composed of sequences of subunits associated with different languages. We demonstrate our method on Twitter data in which both Dutch and dialect varieties labeled as Limburgish, a minority language, are used.
Original languageEnglish
Title of book/volumeProceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics (ACL)
Pages82-28
ISBN (Print) 978-1-945626-08-1
Publication statusPublished - 01 Aug 2016

ID: 2747480