Dialectometry measures the differences between dialects in ways which may involve many independently varying parameters which must be specified in combination in order to arrive at measures of difference. The existence of many parameters of measurement and possible interaction introduces the problem of how to choose parameter values and combinations of them intelligently. This paper proceeds from the assumption that dialectology proper must reveal geographic coherence in language variation in order to propose a yardstick with which to compare measurements made using various parameter settings, and it presents some results of its application.
Abstract. Dialectometry produces aggregate distance matrices in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping dialect areas. The importance of dialect areas has been challenged by proponents of continua, but they too need to compare their findings to older literature, expressed in terms of areas.Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce composite clustering, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.The present contribution compares Kleiweg et al.'s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.
We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-gramsalthough we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances.
A composite cluster map displays a fuzzy categorisation of geographic areas. It combines information from several sources to provide a visualisation of the significance of cluster borders. The basic technique renders the chance that two neighbouring locations are members of different clusters as the darkness of the border that is drawn between those two locations. Adding noise to the clustering process is one way to obtain an estimate about how fixed a border is. We verify the reliability of our technique by comparing a composite cluster map with results obtained using multi-dimensional scaling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.