In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 containing recordings from one male and one female speaker. Each speaker recorded approximately one hour of read speech from the Carnegie Mellon University ARCTIC prompts, from which we generated orthographic and forced-aligned phonetic transcriptions. In addition, we manually annotated 150 utterances per speaker to identify three types of mispronunciation errors: substitutions, deletions, and additions, making it a valuable resource not only for research in voice conversion and accent conversion but also in computer-assisted pronunciation training. The corpus is publicly accessible at https://psi.engr.tamu.edu/l2-arctic-corpus/.
The accurate identification of likely segmental pronunciation errors produced by nonnative speakers of English is a longstanding goal in pronunciation teaching. Most lists of pronunciation errors for speakers of a particular first language (L1) are based on the experience of expert linguists or teachers of English as a second language (ESL) and English as a foreign language (EFL). Such lists are useful, but they are also subject to blind spots for less noticeable errors while suggesting that other more noticeable errors are more important. This exploratory study tested whether using a database of read sentences would reveal recurrent errors that had been overlooked by expert opinions. We did a systematic error analysis of advanced L1 Arabic learners of English ( n = 4) using L2 Arctic, a publicly available collection of 1,132 phonetically-balanced English sentences read aloud by 24 speakers of six language backgrounds. To test whether the database was useful for pronunciation error identification, we analysed Arabic speakers’ sentence readings ( n = 599), which were annotated in Praat for pronunciation deviations from General American English. The findings give an empirically supported description of persistent pronunciation errors for Arabic learners of English. Although necessarily limited in scope, the study demonstrates how similar datasets can be used regardless of the L1 being investigated. The discussion of errors in pronunciation in terms of their functional loads (Brown, 1988) suggests which persistent errors are likely to be important for classroom attention, helping teachers focus their limited classroom time for optimal learning.
Research on segmentals has suggested that a key component of High Variability Phonetic Training (HVPT) is high talker variability. However, the extent to which high talker variability improves perception of tones is unclear. This study examined the effects of high talker variability on the perception of Mandarin tones (Tones 1–4) by English-speaking listeners. A training paradigm that used multiple talkers (multitalker group) was compared with a paradigm that used one talker (single-talker group). The results showed that the multitalker group outperformed the single-talker group, and they retained their learning better than the single-talker group did for 6 months. Neither group, however, improved their perception of Tone 1 or generalized their learning of monosyllables to disyllables. The results suggest that although high talker variability can effectively improve tone perception, it does not improve the perception of more tone categories or yield generalization of learning to more contexts compared to low talker variability.
Previous research has not extensively investigated whether High Variability Phonetic Training (HVPT) is effective in training listeners with no musical background and no prior experience with a tone language in their identification of non-native lexical tones. In this study, it was investigated whether HVPT is applicable to the acquisition of non-native tones by such listeners. Twenty-one speakers of American English were trained in eight sessions using the HVPT approach to identify Mandarin tones in monosyllabic words. Ten of the participants were exposed to words produced by multiple talkers (MT condition), and eleven participants were exposed to words produced by a single talker (ST condition). The listeners’ identification accuracy revealed an average 44% increase from the pretest to the posttest for the MT condition and an average 30% increase for the ST condition. The improvement also generalized to new monosyllabic words produced by a familiar talker and those produced by two unfamiliar talkers. The learning however did not generalize to novel disyllabic words produced either by a familiar talker or an unfamiliar talker. Comparisons between two groups further revealed that the improvement of the listeners in the MT condition was significantly higher than that of the listeners in the ST condition.
This study presents an acoustic analysis of three sentence types in Thai (declarative, interrogative, and emphatic) with the goal of providing a basic characterization of their prosody. To investigate prosodic realizations of sentence final syllables, we placed, in a sentence-final position, a target word which varied in one of the 5 lexical tones in Thai. We also varied the tonal context before the target word so that the pre-target word ends with low (21), mid (31), or high (45) tones. Preliminary results from one speaker show that F0 measures, especially f0 maximum, minimum, and range, differed across sentence types. In particular, emphatic sentences were distinguished from non-emphatic sentences by expanded F0 range, whereas target words in questions were distinguished from those in declarative sentences by both higher F0 maximum and minimum. Syllable duration also played a role in signaling emphasis and question: emphatic sentences were significantly longer than non-emphatic sentences, and questions were significantly shorter than declarative sentences. Interestingly, the tonal pattern of the target word changed for the case of emphasis when the target word had 31 and 45 tones. We will present findings from four additional Thai speakers and discuss their relevance to the intonational phonology of Thai.
This study investigates the relationship between fundamental frequency at the onset of voicing (onset f0) and Voice Onset Time (VOT) in a tonal language with prevoiced, short-lag, and long-lag stops. Recent research on Thai and Vietnamese has suggested that higher f0 in the following vowel is conditioned by long-lag stops, but this effect occurs more in higher, not lower, tones and in words produced in isolation, not in a carrier phrase. An examination of previous studies, however, suggests that the effect may be moderated by vowel length and the type of carrier phrase. To determine whether this is true, this study compares onset f0 measured 40 ms after voicing onset in Thai low tone words with phonemically short and long vowels that occur in two types of carrier phrases and in isolation. The results show that prevoiced, not short- or long-lag, stops condition higher onset f0 in short, not long, vowels, and this effect takes place in words occurring in both types of carrier phrases, not in isolation. This suggests that vowel length may be a relevant factor. The results will be discussed further, and implications for onset f0 control will be offered.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.