In recent years, with the increasing frequency of international exchanges, people have gradually realized that language is a tool of communication and communication, and language learning should attach importance to oral teaching. However, in traditional classrooms, one of the problems faced by oral teaching is the mismatch of the teacher-student ratio: a teacher has to deal with dozens of students, one-on-one oral teaching and pronunciation guidance is impossible, and it is also affected by the teachers and the environment constraints. Therefore, the research on how to efficiently automate pronunciation training is becoming more and more popular. Many phonemes in English have different facial visual features, especially vowels. Almost all of them can be distinguished by the roundness and tightness of the lips in appearance. In order to give full play to the role of lip features in oral pronunciation error detection, this paper proposes a multimodal feature fusion model based on lip angle features. The model interpolates the lip features constructed based on the opening and closing angles and combines audio and video in time series. Feature alignment and fusion and feature learning and classification are realized through the two-way LSTM SOFTMAX layer, and finally, end-to-end pronunciation error detection is realized through CTC. It is verified on the GRID audio and video corpus after phoneme conversion and the self-built multimodal test set. The experimental results show that the model has a higher false pronunciation recognition rate than the traditional single-modal acoustic error detection model. The increase in error detection rate is more obvious. Verification by the audio and video corpus with white noise was added, and the proposed model has better noise immunity than the traditional acoustic model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.