Identifying linguistic structure in a quantitative analysis of dialect pronunciation

2010

Phil. Trans. R. Soc. B

We examine situations in which linguistic changes have probably been propagated via normal contact as opposed to via conquest, recent settlement and large-scale migration. We proceed then from two simplifying assumptions: first, that all linguistic variation is the result of either diffusion or independent innovation, and, second, that we may operationalize social contact as geographical distance. It is clear that both of these assumptions are imperfect, but they allow us to examine diffusion via the distribution of linguistic variation as a function of geographical distance. Several studies in quantitative linguistics have examined this relation, starting with Séguy (Séguy 1971 Rev. Linguist. Romane 35,, and virtually all report a sublinear growth in aggregate linguistic variation as a function of geographical distance. The literature from dialectology and historical linguistics has mostly traced the diffusion of individual features, however, so that it is sensible to ask what sort of dynamic in the diffusion of individual features is compatible with Séguy's curve. We examine some simulations of diffusion in an effort to shed light on this question.

Section: A Dialectometric View Of Gravitymentioning

confidence: 99%

Measuring the diffusion of linguistic change

2010

Phil. Trans. R. Soc. B

“…)’ are at the top of the list, all showing strong correlations ( r > 0.5) with the first, most significant dimension in the MDS solution, suggesting that the variation in the stressed vowel (standard German [a i ], but South [i]) is the single strongest indicator of provenance among the 201 words in our sample. Prokić (2007) explores more systematic analysis of the aligned segments with the goals of identifying the linguistic factors in aggregate analysis.…”

Section: General Characterizationsmentioning

confidence: 99%

Data‐Driven Dialectology

Language and Linguist. Compass

2009

116

Most studies of language variation proceed from the geographic or social distribution of single elements (features), and find it difficult to proceed further. Data‐driven dialectology, and more generally, data‐driven variationist studies, begin instead from an aggregate view of language variation and reap immediate benefits in dealing with well‐known exceptions in the distributions of single features and in avoiding the need to select which features to use as the basis of characterizations. But the major advance is the opportunity to characterize general tendencies in linguistic variation.

“…Nerbonne [10] examined the distance matrices induced by each of two hundred vowel pronunciations automatically extracted from a large American collection, and subsequently applied factor analysis to the covariance matrices obtained from the collection of vowel distance matrices. Prokić [11] analyzed Bulgarian pronunciation using an edit distance algorithm and then collected commonly aligned sounds. She developed an index to measure how characteristic a given sound correspondence is for a given site.…”

Section: Introductionmentioning

confidence: 99%

Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features

Wieling

Computer Speech & Language

2011

In this study we use bipartite spectral graph partitioning to simultaneously cluster varieties and identify their most distinctive linguistic features in Dutch dialect data.While clustering geographical varieties with respect to their features, e.g. pronunciation, is not new, the simultaneous identification of the features which give rise to the geographical clustering presents novel opportunities in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant features which co-vary with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual features which are strongly associated, even while seeking groups of sites which share subsets of these same features. We show that the application of this method results in clear and sensible geographical groupings and discuss and analyze the importance of the concomitant features.Key words: Bipartite spectral graph partitioning, Clustering, Sound correspondences, Dialectometry, Dialectology, Language variation $ This paper is an extended version of the study 'Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology' by Martijn Wieling and John Nerbonne [1]