Finding Dialect Areas by Means of Bootstrap Clustering

Onderzoeksoutput: Hoofdstuk in boek/boekdeelHoofdstukWetenschappelijk

Samenvatting

In dialectometry cluster analysis is a means to find groups given a set of local dialects and their mutual linguistic distances. The weakness of cluster analysis is its instability; small differences in the distance matrix may strongly change the results. Kleiweg, Nerbonne & Bosveld (2004) introduced composite cluster maps, which are obtained by collecting chances that pairs of neighboring elements are part of different clusters as indicated by the darkness of the border that is drawn between those two locations. Noise is added to the clustering process, which enables the authors to estimate about how fixed a border is. Nerbonne et al. (2008) use clustering with noise and bootstrap clustering to overcome instability. Both the work of Kleiweg, Nerbonne & Bosveld (2004) and Nerbonne et al. (2008) focus on boundaries which may be weaker or stronger. We introduce a new flavor of bootstrap clustering which generates areas, similar to classical dialect maps. We perform a procedure consisting of four steps. First, we randomly select 1,000 times n items from n items with replacement. For each resampled set of items we calculate the aggregated distances. Second, on the basis of the distances we perform agglomerative hierarchical cluster analysis. We choose nearest neighbor clustering since this method reflects the idea of dialect areas as continua. On the basis of the tree we determine the number of natural groups by means of the elbow method. Third, for each pair of dialects we count the number of times that both dialects are found in the same natural group. Fourth, when two dialects belong to the same group in more than 95% of the cases, we mark them as ‘connected.’ In this way we will obtain networks which are the groups. We apply the procedure to distances in the sound components measured with Levenhstein distance between a set of 86 Dutch dialects. We use material which was collected in the period 2008–2011.
Originele taal-2Engels
Titel From Semantics to Dialectometry
SubtitelFestschrift in honor of John Nerbonne
UitgeverijCollege Publications
Pagina's127-135
StatusGepubliceerd - 2017

Publicatie series

NaamTributes 32

Vingerafdruk

Duik in de onderzoeksthema's van 'Finding Dialect Areas by Means of Bootstrap Clustering'. Samen vormen ze een unieke vingerafdruk.

Citeer dit