Peter Kleiweg scite author profile

Journal of Quantitative Linguistics

2007

Dialectometry measures the differences between dialects in ways which may involve many independently varying parameters which must be specified in combination in order to arrive at measures of difference. The existence of many parameters of measurement and possible interaction introduces the problem of how to choose parameter values and combinations of them intelligently. This paper proceeds from the assumption that dialectology proper must reveal geographic coherence in language variation in order to propose a yardstick with which to compare measurements made using various parameter settings, and it presents some results of its application.

Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering

Heeringa

Manni

et al. 2008

Abstract. Dialectometry produces aggregate distance matrices in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping dialect areas. The importance of dialect areas has been challenged by proponents of continua, but they too need to compare their findings to older literature, expressed in terms of areas.Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce composite clustering, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.The present contribution compares Kleiweg et al.'s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.

Evaluation of string distance algorithms for dialectology

Heeringa

Gooskens

et al. 2006

We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-gramsalthough we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances.

Validating Dialect Comparison Methods

2002

Geographic Projection of Cluster Composites

Bosveld

2004

A composite cluster map displays a fuzzy categorisation of geographic areas. It combines information from several sources to provide a visualisation of the significance of cluster borders. The basic technique renders the chance that two neighbouring locations are members of different clusters as the darkness of the border that is drawn between those two locations. Adding noise to the clustering process is one way to obtain an estimate about how fixed a border is. We verify the reliability of our technique by comparing a composite cluster map with results obtained using multi-dimensional scaling.