Relationships among Japanese local dialectal words by Levenshtein distance and multivariate analysis

Activiteit: Toespraak of presentatieAcademisch

Beschrijving

The Linguistic Atlas of Japan Database (LAJDB) is a huge data set including transcriptions of no fewer than 2,400 local Japanese dialects and maximally 37 items per location. Using these data we will answer the following questions:

1. What are the relationships among the 2,400 varieties, and what areas do they represent?
2. What are the relationships among the 37 items, and what dimensions do they represent?

Using led-a.org we applied PMI Levenshtein distance (Wieling 2012) to 2400 localities and 37 items of the LAJDB. Ward’s clustering and t-distributed stochastic neighbor embedding (t-SNE) were applied to the distances that we obtained among the 2400 localities. With Ward’s clustering we found five natural groups which we projected in a map. Using 3-dimensional t-SNE we created a continuum map showing a dialect continuum. The results that we obtained are plausible and consistent with the preceding studies such as those from Fujiwara (1990), Inoue (2001) and others. They reinforce, rather than contradict, the results of previous Japanese dialectologists (Heeringa & Inoue 2023).

Dialectometry usually focuses on geography, i.e. the dimension of local dialects and their inter-relationships. However, it is also possible to analyze the item dimension and their inter-relationships. The two dimensions – dialect variety and item - are two sides of the same coin.

When analyzing the item dimension we followed Nerbonne (2006) who applied factor analysis in order to identify the linguistic structures that are represented by the average Levenshtein distances. We found two main factors. In order to understand the meaning of the factors, we applied Ward’s clustering to the loadings of the factors, which divided the 37 items in three sets:

• a set of 14 items that in particular loaded on the first factor;
• a set of 11 items that in particular loaded on the second factor;
• a set of 12 items loading on neither factor:

Subsequently for each of the three sets of items we measured Levenshtein distances on the basis of the items in that set. When applying Ward’s clustering to the distances that were obtained for each set, we found the following results:

• items with high loadings on factor 1 represent a tripartite division: north, central and the southern islands;
• items with high loadings on factor 2 represent a dichotomy with a border in the south of the mainland;
• items close to the origin represent a dichotomy with a border in the center of the mainland.

When superimposing the three divisions, a map is obtained that is almost identical to the map with five areas that we initially obtained on the basis of the full set of 37 items.



References

Heeringa, Wilbert & Fumio Inoue (2023), Exploring the Japanese Dialect Geography Dialectometrically: Division and Continuity. Studies in Geolinguistics 3, 1-44.

Fujiwara, Yoichi (1990) Nihongo Hogen Bunpa ron [Dialect Propagation Theory of Japanese] Tokyo: Musashino Shoin.

Inoue, Fumio (2001) Keiryoteki Hogen Kukaku [Quantificational Dialect Classification] Tokyo: Meiji Shoin.

Nerbonne, J. (2006), Identifying Linguistic Structure in Aggregate. In: J. Nerbonne and W. Kretzschmar, Jr. (eds.), Progress in Dialectometry, special issue of Literary and Linguistic Computing, selected proceedings of a workshop at Methods in Dialectology XII, Moncton, Aug. 5, 2005.

Nerbonne, J. (2006). Identifying linguistic structure in aggregate comparison. Literary and Linguistic Computing, 21(4), 463-475, https://doi.org/10.1093/llc/fql041.

Wieling, Martijn (2012). A quantitative approach to social and geographical dialect variation. Doctoral dissertation, University of Groningen. URI: https://hdl.handle.net/11370/cd637817-572f-4826-98c1-08272775fb64
Periode01 jul. 2024
EvenementstitelMethods XVIII: Eighteenth International Conference on Methods in Dialectology: Eighteenth International Conference on Methods in Dialectology
EvenementstypeConferentie
LocatieMelbourne, OostenrijkToon op kaart
Mate van erkenningInternationaal