Abstract:The aim of this paper is to present a new method for identifying linguistic structure in the aggregate analysis of the language variation. The method consists of extracting the most frequent sound correspondences from the aligned transcriptions of words. Based on the extracted correspondences every site is compared to all other sites, and a correspondence index is calculated for each site. This method enables us to identify sound alternations responsible for dialect divisions and to measure the extent to which… Show more
“…The pronunciation differences were analysed using the procedure sketched in §3, and these correlate strongly with logarithmic geographical distances (r ¼ 0.469). Prokić (2007) obtained data on Bulgarian dialectology from Prof. Vladimir Zhobov's group at St Clement of Ohrid's University of Sofia. Prokić worked on broad phonetic transcriptions of 156 words from 197 sampling sites in Bulgaria.…”
Section: A Dialectometric View Of Gravitymentioning
We examine situations in which linguistic changes have probably been propagated via normal contact as opposed to via conquest, recent settlement and large-scale migration. We proceed then from two simplifying assumptions: first, that all linguistic variation is the result of either diffusion or independent innovation, and, second, that we may operationalize social contact as geographical distance. It is clear that both of these assumptions are imperfect, but they allow us to examine diffusion via the distribution of linguistic variation as a function of geographical distance. Several studies in quantitative linguistics have examined this relation, starting with Séguy (Séguy 1971 Rev. Linguist. Romane 35,, and virtually all report a sublinear growth in aggregate linguistic variation as a function of geographical distance. The literature from dialectology and historical linguistics has mostly traced the diffusion of individual features, however, so that it is sensible to ask what sort of dynamic in the diffusion of individual features is compatible with Séguy's curve. We examine some simulations of diffusion in an effort to shed light on this question.
“…The pronunciation differences were analysed using the procedure sketched in §3, and these correlate strongly with logarithmic geographical distances (r ¼ 0.469). Prokić (2007) obtained data on Bulgarian dialectology from Prof. Vladimir Zhobov's group at St Clement of Ohrid's University of Sofia. Prokić worked on broad phonetic transcriptions of 156 words from 197 sampling sites in Bulgaria.…”
Section: A Dialectometric View Of Gravitymentioning
We examine situations in which linguistic changes have probably been propagated via normal contact as opposed to via conquest, recent settlement and large-scale migration. We proceed then from two simplifying assumptions: first, that all linguistic variation is the result of either diffusion or independent innovation, and, second, that we may operationalize social contact as geographical distance. It is clear that both of these assumptions are imperfect, but they allow us to examine diffusion via the distribution of linguistic variation as a function of geographical distance. Several studies in quantitative linguistics have examined this relation, starting with Séguy (Séguy 1971 Rev. Linguist. Romane 35,, and virtually all report a sublinear growth in aggregate linguistic variation as a function of geographical distance. The literature from dialectology and historical linguistics has mostly traced the diffusion of individual features, however, so that it is sensible to ask what sort of dynamic in the diffusion of individual features is compatible with Séguy's curve. We examine some simulations of diffusion in an effort to shed light on this question.
“…)’ are at the top of the list, all showing strong correlations ( r > 0.5) with the first, most significant dimension in the MDS solution, suggesting that the variation in the stressed vowel (standard German [a i ], but South [i]) is the single strongest indicator of provenance among the 201 words in our sample. Prokić (2007) explores more systematic analysis of the aligned segments with the goals of identifying the linguistic factors in aggregate analysis.…”
Most studies of language variation proceed from the geographic or social distribution of single elements (features), and find it difficult to proceed further. Data‐driven dialectology, and more generally, data‐driven variationist studies, begin instead from an aggregate view of language variation and reap immediate benefits in dealing with well‐known exceptions in the distributions of single features and in avoiding the need to select which features to use as the basis of characterizations. But the major advance is the opportunity to characterize general tendencies in linguistic variation.
“…Nerbonne [10] examined the distance matrices induced by each of two hundred vowel pronunciations automatically extracted from a large American collection, and subsequently applied factor analysis to the covariance matrices obtained from the collection of vowel distance matrices. Prokić [11] analyzed Bulgarian pronunciation using an edit distance algorithm and then collected commonly aligned sounds. She developed an index to measure how characteristic a given sound correspondence is for a given site.…”
In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data.While clustering geographical varieties with respect to their features, e.g. pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportunities in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant features which co-vary with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual features which are strongly associated, even while seeking groups of sites which share subsets of these same features. We show that the application of this method results in clear and sensible geographical groupings and discuss and analyze the importance of the concomitant features.Key words: Bipartite spectral graph partitioning, Clustering, Sound correspondences, Dialectometry, Dialectology, Language variation $ This paper is an extended version of the study 'Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology' by Martijn Wieling and John Nerbonne [1]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.