A B S T R A C TThe Levenshtein dialect distance method has proven to be a successful method for measuring phonetic distances between Dutch dialects. The aim of the present investigation is to validate the Levenshtein dialect distance with perceptual data from a language area other than the Dutch, namely Norway. We calculate the correlation between the Levenshtein distances and the distances between 15 Norwegian dialects as judged by Norwegian listeners. We carry out this analysis to see the degree to which the average Levenshtein distances correspond to the psychoacoustic perception of the speakers of the dialects.The present article reports on part of a study supported by NWO, the Netherlands Organization for Scientific Research. We are grateful for the permission from Kristian Skarbø and Jørn Almberg to use their material and for the help of Jørn Almberg during the whole investigation. We thank Saakje van Dellen for her obliging help with the data entry and Peter Kleiweg for letting us use the programs that he developed for the visualization of the maps and dendrograms in this article. Finally, we would like to thank John Nerbonne for valuable comments and for correcting our English.
W i l b e r t H e e r i n g a a n d J o h n N e r b o n n e University of GronigenThe organizing concept behind dialect variation is still seen predominantly as the areas within which similar varieties are spoken. The opposing view-that dialects are organized in a continuum without sharp boundaries-is likewise popular. This article introduces a new element into the discussion, which is the opportunity to view dialectal differences in the aggregate. We employ a dialectometric technique that provides an additive measure of pronunciation difference: the (aggregate) pronunciation distance. This allows us to determine how much of the linguistic variation is accounted for by geography. In our sample of 27 Dutch towns and villages, the variation ranges between 65% and 81%, which lends credence to the continuum view. The borders of well-established dialect areas nonetheless show large deviations from the expected aggregate pronunciation distance. We pay particular attention to a puzzle concerning the subjective perception of continua introduced by Chambers and Trudgill (1998): a traveller walking in a straight line from village to village notices successive small changes, but seldom, if ever, observes large differences. This sounds like a justification of the continuum view, but there is an added twist. Might the traveller be misled by the perspective of most recent memory? We use the Chambers-Trudgill puzzle to organize our argument at several points.
Abstract. Dialectometry produces aggregate distance matrices in which a distance is specified for each pair of sites. By projecting groups obtained by clustering onto geography one compares results with traditional dialectology, which produced maps partitioned into implicitly non-overlapping dialect areas. The importance of dialect areas has been challenged by proponents of continua, but they too need to compare their findings to older literature, expressed in terms of areas.Simple clustering is unstable, meaning that small differences in the input matrix can lead to large differences in results (Jain et al. 1999). This is illustrated with a 500-site data set from Bulgaria, where input matrices which correlate very highly (r = 0.97) still yield very different clusterings. Kleiweg et al. (2004) introduce composite clustering, in which random noise is added to matrices during repeated clustering. The resulting borders are then projected onto the map.The present contribution compares Kleiweg et al.'s procedure to resampled bootstrapping, and also shows how the same procedure used to project borders from composite clustering may be used to project borders from bootstrapping.
We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of n-gramsalthough we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.