With an eye toward measuring the strength of foreign accents in American English, we evaluate the suitability of a modified version of the Levenshtein distance for comparing (the phonetic transcriptions of) accented pronunciations. Although this measure has been used successfully inter alia to study the differences among dialect pronunciations, it has not been applied to studying foreign accents. Here, we use it to compare the pronunciation of non-native English speakers to native American English speech. Our results indicate that the Levenshtein distance is a valid native-likeness measurement, as it correlates strongly (r = -0.81) with the average “native-like” judgments given by more than 1000 native American English raters.
In this study we develop pronunciation distances based on naive discriminative learning (NDL). Measures of pronunciation distance are used in several subfields of linguistics, including psycholinguistics, dialectology and typology. In contrast to the commonly used Levenshtein algorithm, NDL is grounded in cognitive theory of competitive reinforcement learning and is able to generate asymmetrical pronunciation distances. In a first study, we validated the NDL-based pronunciation distances by comparing them to a large set of native-likeness ratings given by native American English speakers when presented with accented English speech. In a second study, the NDL-based pronunciation distances were validated on the basis of perceptual dialect distances of Norwegian speakers. Results indicated that the NDL-based pronunciation distances matched perceptual distances reasonably well with correlations ranging between 0.7 and 0.8. While the correlations were comparable to those obtained using the Levenshtein distance, the NDL-based approach is more flexible as it is also able to incorporate acoustic information other than sound segments.
We examine a case of word order variation where speakers choose between two near-synonymous constructions partly on the basis of the processing complexity of the construction and its context. When producing two-verb clusters in Dutch, a speaker can choose between two word orders. Previous corpus studies have shown that a wide range of factors are associated with this word order variation. We conducted a large-scale corpus study in order to discover what these factors have in common. The underlying generalization appears to be processing complexity: we show that a variety of factors that are related to verbal cluster word order, can also be related to the processing complexity of the cluster's context. This implies that one of the word orders might be easier to process d when processing load is high, speakers will go for the easier option. Therefore, we also investigate which of the two word orders might be easier to process. By testing for associations with factors indicating a higher or lower processing complexity of the verb and its context, we find evidence for the hypothesis that the word order where the main verb comes last is easier to process.
In this work, we address the evaluation of distributional semantic models trained on smaller, domain-specific texts, particularly philosophical text. Specifically, we inspect the behaviour of models using a pretrained background space in learning. We propose a measure of consistency which can be used as an evaluation metric when no in-domain gold-standard data is available. This measure simply computes the ability of a model to learn similar embeddings from different parts of some homogeneous data. We show that in spite of being a simple evaluation, consistency actually depends on various combinations of factors, including the nature of the data itself, the model used to train the semantic space, and the frequency of the learned terms, both in the background space and in the in-domain data of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.