Samenvatting
The question how we can reliably estimate the lexical diversity of a particular text (collection) has often been asked by linguists and literary scholars alike. This short paper introduces a way of operationalizing functional diversity measurements by means of token-based embeddings, and argues that functional diversity is not only a practically advantageous, but also a theoretically relevant addition to the Computational Humanities Research toolkit. By means of an experiment on the historical ARCHER corpus, we show that lexical diversity at the level of functional groups is less sensitive to orthographic variation, and provides insight into an important and often disregarded dimension of vocabulary diversity
in textual data.
in textual data.
Originele taal-2 | Engels |
---|---|
Titel | Proceedings of the Computational Humanities Research Conference 2022 |
Redacteuren | Folgert Karsdorp, Alie Lassche, Kristoffer Nielbo |
Uitgeverij | CEUR Workshop Proceedings |
Pagina's | 114-126 |
Volume | 3290 |
Status | Gepubliceerd - nov. 2022 |