Alif Silpachai scite author profile

In this paper, we introduce L2-ARCTIC, a speech corpus of non-native English that is intended for research in voice conversion, accent conversion, and mispronunciation detection. This initial release includes recordings from ten non-native speakers of English whose first languages (L1s) are Hindi, Korean, Mandarin, Spanish, and Arabic, each L1 containing recordings from one male and one female speaker. Each speaker recorded approximately one hour of read speech from the Carnegie Mellon University ARCTIC prompts, from which we generated orthographic and forced-aligned phonetic transcriptions. In addition, we manually annotated 150 utterances per speaker to identify three types of mispronunciation errors: substitutions, deletions, and additions, making it a valuable resource not only for research in voice conversion and accent conversion but also in computer-assisted pronunciation training. The corpus is publicly accessible at https://psi.engr.tamu.edu/l2-arctic-corpus/.

Golden speaker builder – An interactive tool for pronunciation training

Ding

Liberatore

Sonsaat

et al. 2019

Speech Communication

Language Teaching Research

The English pronunciation of Arabic speakers: A data-driven approach to segmental error identification

Rehman¹,

Silpachai²,

Levis

et al. 2020

The accurate identification of likely segmental pronunciation errors produced by nonnative speakers of English is a longstanding goal in pronunciation teaching. Most lists of pronunciation errors for speakers of a particular first language (L1) are based on the experience of expert linguists or teachers of English as a second language (ESL) and English as a foreign language (EFL). Such lists are useful, but they are also subject to blind spots for less noticeable errors while suggesting that other more noticeable errors are more important. This exploratory study tested whether using a database of read sentences would reveal recurrent errors that had been overlooked by expert opinions. We did a systematic error analysis of advanced L1 Arabic learners of English ( n = 4) using L2 Arctic, a publicly available collection of 1,132 phonetically-balanced English sentences read aloud by 24 speakers of six language backgrounds. To test whether the database was useful for pronunciation error identification, we analysed Arabic speakers’ sentence readings ( n = 599), which were annotated in Praat for pronunciation deviations from General American English. The findings give an empirically supported description of persistent pronunciation errors for Arabic learners of English. Although necessarily limited in scope, the study demonstrates how similar datasets can be used regardless of the L1 being investigated. The discussion of errors in pronunciation in terms of their functional loads (Brown, 1988) suggests which persistent errors are likely to be important for classroom attention, helping teachers focus their limited classroom time for optimal learning.

The role of talker variability in the perceptual learning of Mandarin tones by American English listeners

2020

JSLP

Research on segmentals has suggested that a key component of High Variability Phonetic Training (HVPT) is high talker variability. However, the extent to which high talker variability improves perception of tones is unclear. This study examined the effects of high talker variability on the perception of Mandarin tones (Tones 1–4) by English-speaking listeners. A training paradigm that used multiple talkers (multitalker group) was compared with a paradigm that used one talker (single-talker group). The results showed that the multitalker group outperformed the single-talker group, and they retained their learning better than the single-talker group did for 6 months. Neither group, however, improved their perception of Tone 1 or generalized their learning of monosyllables to disyllables. The results suggest that although high talker variability can effectively improve tone perception, it does not improve the perception of more tone categories or yield generalization of learning to more contexts compared to low talker variability.

Using high variability phonetic training to train non-tonal listeners with no musical background to perceive lexical tones

2018

Previous research has not extensively investigated whether High Variability Phonetic Training (HVPT) is effective in training listeners with no musical background and no prior experience with a tone language in their identification of non-native lexical tones. In this study, it was investigated whether HVPT is applicable to the acquisition of non-native tones by such listeners. Twenty-one speakers of American English were trained in eight sessions using the HVPT approach to identify Mandarin tones in monosyllabic words. Ten of the participants were exposed to words produced by multiple talkers (MT condition), and eleven participants were exposed to words produced by a single talker (ST condition). The listeners’ identification accuracy revealed an average 44% increase from the pretest to the posttest for the MT condition and an average 30% increase for the ST condition. The improvement also generalized to new monosyllabic words produced by a familiar talker and those produced by two unfamiliar talkers. The learning however did not generalize to novel disyllabic words produced either by a familiar talker or an unfamiliar talker. Comparisons between two groups further revealed that the improvement of the listeners in the MT condition was significantly higher than that of the listeners in the ST condition.

Prosodic characteristics of three sentence types in Thai

2012

This study presents an acoustic analysis of three sentence types in Thai (declarative, interrogative, and emphatic) with the goal of providing a basic characterization of their prosody. To investigate prosodic realizations of sentence final syllables, we placed, in a sentence-final position, a target word which varied in one of the 5 lexical tones in Thai. We also varied the tonal context before the target word so that the pre-target word ends with low (21), mid (31), or high (45) tones. Preliminary results from one speaker show that F0 measures, especially f0 maximum, minimum, and range, differed across sentence types. In particular, emphatic sentences were distinguished from non-emphatic sentences by expanded F0 range, whereas target words in questions were distinguished from those in declarative sentences by both higher F0 maximum and minimum. Syllable duration also played a role in signaling emphasis and question: emphatic sentences were significantly longer than non-emphatic sentences, and questions were significantly shorter than declarative sentences. Interestingly, the tonal pattern of the target word changed for the case of emphasis when the target word had 31 and 45 tones. We will present findings from four additional Thai speakers and discuss their relevance to the intonational phonology of Thai.

The roles of vowel length and sentential context in onset pitch perturbations in Thai

2019

This study investigates the relationship between fundamental frequency at the onset of voicing (onset f0) and Voice Onset Time (VOT) in a tonal language with prevoiced, short-lag, and long-lag stops. Recent research on Thai and Vietnamese has suggested that higher f0 in the following vowel is conditioned by long-lag stops, but this effect occurs more in higher, not lower, tones and in words produced in isolation, not in a carrier phrase. An examination of previous studies, however, suggests that the effect may be moderated by vowel length and the type of carrier phrase. To determine whether this is true, this study compares onset f0 measured 40 ms after voicing onset in Thai low tone words with phonemically short and long vowels that occur in two types of carrier phrases and in isolation. The results show that prevoiced, not short- or long-lag, stops condition higher onset f0 in short, not long, vowels, and this effect takes place in words occurring in both types of carrier phrases, not in isolation. This suggests that vowel length may be a relevant factor. The results will be discussed further, and implications for onset f0 control will be offered.

Speech Intelligibility

Levis¹,

Silpachai²

2022