Abstract:Pairwise string alignment (PSA) is an important general technique for obtaining a measure of similarity between two strings, used e.g., in dialectology, historical linguistics, transliteration, and in evaluating name distinctiveness. The current study focuses on evaluating different PSA methods at the alignment level instead of via the distances it induces. About 3.5 million pairwise alignments of Bulgarian phonetic dialect data are used to compare four algorithms with a manually corrected gold standard. The a… Show more
“…In dialectometry (Wieling et al, 2009), the segment-segment similarity matrix is estimated using pointwise mutual information (PMI). The PMI score for two sounds x and y is defined as followed:…”
This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.
“…In dialectometry (Wieling et al, 2009), the segment-segment similarity matrix is estimated using pointwise mutual information (PMI). The PMI score for two sounds x and y is defined as followed:…”
This paper presents a computational analysis of Gondi dialects spoken in central India. We present a digitized data set of the dialect area, and analyze the data using different techniques from dialectometry, deep learning, and computational biology. We show that the methods largely agree with each other and with the earlier non-computational analyses of the language group.
“…Hanks, 1990). This method was introduced by Wieling et al (2009) and found to yield superior alignments as well as acoustically sensible sound correspondences (Wieling et al, to appear). 5 As multiple speakers were interviewed in every location, we used the most frequent phonetic variant as representative of all attested PVs for every normalized form.…”
“…While the original Levenshtein edit distance is based on these three operations without any restrictions, later algorithms adapt this method by additional edit operations or restrictions. Wieling et al (2009) compare several alignment algorithms applied to dialect pronunciation data. These algorithms include several adaptations of the Levenshtein algorithm and the Pair Hidden Markov Model.…”
Section: Levenshtein-based Algorithmsmentioning
confidence: 99%
“…Levenshtein algorithm with distances based on PMI: Wieling et al (2009) use Point-wise Mutual Information (PMI) as the basis for segment distances. They assign different costs to segments, and use the entire dataset for each alignment.…”
Section: Levenshtein-based Algorithmsmentioning
confidence: 99%
“…All alignment algorithms based on Levenshtein distance evaluated by Wieling et al (2009) restrict aligning vowels with consonants.…”
This paper addresses the problems of measuring similarity between languageswhere the term language covers any of the senses denoted by language, dialect or linguistic variety, as defined by any theory. We argue that to devise an effective way to measure the similarity between languages one should build a probabilistic model that tries to capture as much regular correspondence between the languages as possible. This approach yields two benefits. First, given a set of language data, for any two models, this gives a way of objectively determining which model is better, i.e., which model is more likely to be accurate and informative. Second, given a model, for any two languages we can determine, in a principled way, how close they are. The better models will be better at judging similarity. We present experiments on data from three language families to support these ideas. In particular, our results demonstrate the arbitrary nature of terms such as language vs. dialect, when applied to related languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.