In this article, we report a large-scale corpus study aimed at tackling the (controversial) question to what extent the European national varieties of Dutch, that is, Belgian and Netherlandic Dutch, exhibit morpho-syntactic differences. Instead of relying on a manual selection of cases of morphosyntactic variation, we first marshal large bilingual parallel corpora and machine translation software to identify semiautomatically, in an extensively data-driven fashion, loci of variation from various “corners” of Dutch grammar. We then gauge the distribution of con-structional alternatives in a nationally as well as stylistically stratified corpus for a representative selection of twenty alternation patterns. We find that natiolectal variation in the grammar of Dutch is far more prevalent than often assumed, especially in less edited text types, and that it shows up in inflection phenomena, lexically conditioned syntactic variation, and pure word order permutations. Another key finding is that many cases of synchronic probabilistic asymmetries reflect a diachronic difference between the two varieties: Netherlandic Dutch often tends to be ahead in cases of ongoing grammatical change, with Belgian Dutch holding on somewhat longer to obsolescent features of the grammar.
- computational linguistics, corpus linguistics, Dutch, grammatical variation, natiolectal variation, parallel corpus