This study investigates how second language (L2) listeners’ perception is affected by two factors: the listeners’ experience with the target dialect – North American English (NAE) vs. Standard Southern British English (SSBE) – and talkers’ language background: native vs. non-native talkers; i.e. interlanguage speech intelligibility benefit (ISIB) talker effects. Two groups of native-Korean-speaking listeners with different target English dialects – L1-Korean listeners of English as a second language (ESL) in the USA and L1-Korean ESL listeners in the UK – were tested on the identification of 12 English vowels spoken by native and non-native (L1-Korean) talkers of NAE and SSBE. The results show that the L2 listeners’ experience with the target dialect had a significant impact on the accuracy of their identification of the L2 vowels. However, no ISIB-talker effects were observed for the L1-Korean listener groups regardless of the listeners’ differences in experience with the two varieties of English. The study adds to the L2 sound acquisition literature and the ISIB literature by looking into L2 learners’ identification of L2 vowels, taking into account the learners’ differences in experience with two standard varieties of English (NAE and SSBE) and the interaction between the learners’ experience with the two varieties and ISIB-talker effects. It also sheds some light on the issue of adult L2 learners’ ability to learn the vowels of a new target variety.
Korean learners of English must create four vowel categories for English (/i, ɪ/ and /ɛ, æ/) in relation to two similar native categories (/i/ and /ɛ/). It is hypothesized that new categories should be easier to learn than similar ones (Flege, 1994), but it is unclear whether the English L2 vowels are similar or new. The degree of similarity between the four English vowels and the two Korean vowels was examined using the distribution metrics (i.e., ellipse overlap, cross-entropy, and Gaussian Mixture Model) as well as Euclidean distance in F1/F2 space. The L2 spoken corpus included 100 repetitions of words in both Korean and English spoken by 37 Korean L2 learners (20 female). Preliminary results indicate that the English (L2) high vowel pair was more overlapped with Korean /i/ than the low vowel pair with Korean /ɛ/, especially for male speakers. For the English (L2) low vowel pair, female speakers showed less overlap but higher variability along F1 direction than male speakers. This demonstrates that the similarity between Korean and English vowels is characterized by the distribution as well as the distances between the vowel categories. Acoustic results will be further compared with identification by native English speakers.
Ultrasound imaging is a non-invasive technique for the measurement of the tongue in speech. Recent advancements in analytical edge detection algorithms and deep learning methods have improved tongue contour segmentation. However, most edge detection algorithms require user input as initialization “seeds” and accuracy can drift as segmentation subsequently proceeds. Deep learning in ultrasound tongue contour tracking, on the other hand, requires large, manually labelled training data, has poor spatial resolution and does not generalize well to images acquired by ultrasound machines outside the training set. Here, we demonstrate an approach combining both edge detection and deep learning in automatic tongue contour tracking in ultrasound images of the tongue, aided by synchronously acquired electromagnetic articulometry (EMA) data. A deep learning model was trained by edge detection seeded by tongue locations from EMA with minimal human intervention. Spatial and temporal constraints for tongue contours were learned simultaneously using a three-dimensional convolutional neural network. Finally, the deep neural network inferred tongue contour in low resolution was passed through additional edge detection to refine the contour in higher resolution. Our preliminary results demonstrate the proposed architecture improved accuracy compared to using analytical edge detection or deep learning alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.