The effect of Zipf’s law on the learnability of language

Activity: Teaching/Examination/SupervisionStudent/Intern supervision


How learnable is language as a distribution? And how do the properties of this distribution affect its learnability? These are natural and important questions in statistics and machine learning, since they are fundamental to choosing models and forming expectations about their performance. Statistical theory would make us expect that most of its distributional properties make learning language particularly difficult. A prominent example of such a property is Zipf’s law which asserts that the word distribution in language follows roughly a power law and which is the main culprit for the massive sparsity in language that makes language modelling such a complex task. Learnability is also the subject of long and intense debates in cognitive linguistics but here, to the contrary, researchers have recently begun to hypothesise that distributional aspects of language such as Zipf’s law exist precisely because they make communication systems easier to learn. In this thesis several LSTMs were trained on a large set of corpora, these sets all differed in their conformity to Zipf’s law. The perplexities of each of these models were then measured and compared. The results show that a lower presence of Zipf’s law reduces the learnability of a language.
PeriodMar 2021Jul 2021
ExamineeMax Bongers
Examination held at
  • University of Amsterdam