LED-A: an app for calculating linguistic distances between language varieties

Activiteit: Toespraak of presentatieAcademisch

Beschrijving

LED-A: an app for calculating linguistic distances between language varieties

Dialectologists, sociolinguists, and other researchers interested in understanding language variation and change may calculate linguistic distances between varieties. Linguistic distance measurements shed light on the relationships between varieties, the historical development of language varieties and the effects of socio-geographic factors, thereby informing both academic research and practical applications in language policy and education. In dialectology, several tools for measuring linguistic distances have been developed, such as Visual Dialectometry (Goebl 2006), DiaTech (Aurrekoetxea et al. 2013), RuG/L04 (Kleiweg & Nerbonne 2023) and Gabmap (Nerbonne et al. 2011, Leinonen et al. 2015).

We present the Levenshtein Edit Distance App (LED-A) as a new tool for measuring and visualizing linguistic distances. LED-A shares features of the aforementioned programs such as cluster analysis, multidimensional scaling, beam maps, network maps, area maps and RGB maps (maps that visualize the dialect landscape as a continuum). Our web app, however, differs in the design of the user interface, and includes features that are not included in the other programs. LED-A is implemented as a Shiny app, and uses a range of R packages.

For the user interface, we aimed to maximize flexibility, user-friendliness and intuitiveness. For the calculation of feature-based linguistic distances, Ségey’s (1973) binary distance measure is available. For the calculation of transcription-based distances, several variants of the Levenshtein distance (Levenshtein 1965, Kessler 1995) are available out-of-the-box, without the need of tweaking a configuration file. Current options are (i) the plain Levenshtein distance, which uses binary operation weights, (ii) feature-based Levenshtein distance, which uses gradual operation weights derived from the IPA charts (Almedia & Braun 1986), and (iii) PMI-based Levenshtein distance, which derives the operation weights by self-learning (Wieling et al. 2009). Both aggregated and individual word distances can be obtained, either on the basis of whole words or only on the basis of vowel or consonant substitutions or indels.

Additonally, linguistic distances can be calculated on the basis of acoustic samples. For each sample a numerical feature representation is computed from Mel-frequency cepstral coefficients (MFCCs). Dynamic Time Warping (DTW) is applied to these representations in order to measure distances between samples (Bartelds et al. 2020).

Instead of calculating distances, it is also possible to read a table with distances that have already been calculated with other techniques and applications. For these imported distances, the user has almost the same functionality available as when the distances were calculated in LED-A itself.

To use the visualization tool to create maps, it is sufficient to upload the coordinates of the places, i.e. no coordinates that constitute the outline are required. Multiple map backgrounds can be chosen from.

In this demonstration we will show how the tool can be used. We welcome input to expand LED-A. The tool itself, and more information about its implementation, can be found at led-a.org.


References

Almeida A. & A. Braun 1986. “Richtig” und “falsch” in phonetischer Transkription; Vorschläge zum Vergleich von Transkriptionen mit Beispielen aus deutschen Dialekten. Zeitschrift für Dialektologie und Linguistik 53(2). 158–172.

Aurrekoetxea, Gotzon, Karmele Fernandez-Aguirre, Jesús Rubio, Borja Ruiz & Jon Sánchez 2013. ‘DiaTech’: A new tool for dialectology. Literary and Linguistic Computing 28(1). 23–30.

Bartelds, Martijn, Caitlin Richter, Mark Liberman & Martijn Wieling 2020. A New Acoustic-Based Pronunciation Distance Measure, Frontiers in Artificial Intelligence 3.

Goebl, Hans 2006. Recent advances in Salzburg dialectometry. Literary and Linguistic Computing 21(4). 411–435.

Kessler, Brett 1995. Computational dialectology in Irish Gaelic. In Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics, 60–67, Dublin. EACL.

Kleiweg, Peter & John Nerbonne 2023. RuG/L04 [computer software]. http://www.let.rug.nl/~kleiweg/L04/ (accessed 30 October 2023).

Leinonen, Therese, Çağrı Çöltekin & John Nerbonne 2016. Using Gabmap. Lingua 178. 71–83.

Levenshtein, Vladimir I. 1965. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmissions, 1(1), 8–17.

Nerbonne, John, Rinke Colen, Charlotte Gooskens, Peter Kleiweg & Therese Leinonen 2011. Gabmap-a web application for dialectology. Dialectologia: revista electrònica. 6589.

Seguy, J. 1973. La dialectometrie dans l'Atlas linguistique de la Gascogne. Revue de linguistique romane, 37: 1–24.

Wieling, Martijn, Jelena Prokić & John Nerbonne 2009. Evaluating the pairwise alignment of pronunciations. In Borin, Lars and Lendvai, Piroska (eds.), Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, (LaTeCH - SHELT&R 2009). Workshop at the 12th Meeting of the European Chapter of the Association for Computational Linguistics. Athens, 30 March 2009. 26–34.
Periode10 jul. 2024
EvenementstitelICLaVE 12: International Conference on Language Variation in Europe
EvenementstypeConferentie
LocatieVienna, OostenrijkToon op kaart
Mate van erkenningInternationaal