Introducing Functional Diversity: A Novel Approach to Lexical Diversity in (Historical) Corpora

F.B. Karsdorp, Enrique Manjavacas, Lauren Fonteyn

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review


The question how we can reliably estimate the lexical diversity of a particular text (collection) has often been asked by linguists and literary scholars alike. This short paper introduces a way of operationalizing functional diversity measurements by means of token-based embeddings, and argues that functional diversity is not only a practically advantageous, but also a theoretically relevant addition to the Computational Humanities Research toolkit. By means of an experiment on the historical ARCHER corpus, we show that lexical diversity at the level of functional groups is less sensitive to orthographic variation, and provides insight into an important and often disregarded dimension of vocabulary diversity
in textual data.
TitelProceedings of the Computational Humanities Research Conference 2022
RedacteurenFolgert Karsdorp, Alie Lassche, Kristoffer Nielbo
UitgeverijCEUR Workshop Proceedings
StatusGepubliceerd - nov. 2022


