Wai-Kim Leung scite author profile

-This paper presents a two-dimensional (2D) visualspeech synthesizer to support language learning. A visual-speech synthesizer animates the human articulators in synchronization with speech signals, e.g., output from a text-to-speech synthesizer. A visual-speech animation can offer a concrete illustration to the language learners on how to move and where to place the articulators when pronouncing a phoneme. We adopt a 2D vector-based viseme models and compiled a collection of visemes to cover the articulation of all English phonemes (42 visemes for the 44 English phonemes). Morphing between properly selected vector-based articulation images achieves articulatory animations. In this way, we have developed an articulatory visual speech synthesizer that can accept free-text input and synthesize articulatory dynamics in real-time. Evaluation involving 32 subjects based on "lip-reading" shows that they can identify the appropriate word(s) based on articulation animation alone nearly ~80% of the time

show abstract

Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training

Zhao

Yuan

Leung

et al. 2013

View full text Add to dashboard Cite

In second language learning, unawareness of the differences between correct and incorrect pronunciations is one of the largest obstacles for mispronunciation correction. In order to make the feedback more discriminatively perceptible, this paper presents a novel method for corrective feedback generation, namely, exaggerated feedback, for language learning. To produce exaggeration effect, the neutral audio and visual speech are both exaggerated and then re-synthesized synchronously based on the audiovisual synthesis technology. The audio speech exaggeration is realized by adjusting the acoustic features related to duration, pitch and energy of the speech according to different phonemes conditions. The visual speech exaggeration is realized by increasing the articulatory movement range and slowing down the movement around the key action. The results show that our methods can effectively generate bimodal exaggeration effect for feedback provision and make them more discriminative to be perceived.

show abstract

Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations

Yuen

Leung

Liu

et al. 2011

View full text Add to dashboard Cite

Abstract-This paper presents our group's latest progress in developing Enunciate -an online computer-aided pronunciation training (CAPT) system for Chinese learners of English.Presently, the system targets segmental pronunciation errors. It consists of an audio-enabled web interface, a speech recognizer for mispronunciation detection and diagnosis, a speech synthesizer and a viseme animator.We present a summary of the system's architecture and major interactive features. We also present statistics from evaluations by English teachers and university students who participate in pilot trials. We are also extending the system to cover suprasegmental training and mobile access.

show abstract

Transformer Based End-to-End Mispronunciation Detection and Diagnosis

Wu¹,

Li²,

Leung³

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wai-Kim Leung

CNN-RNN-CTC Based End-to-end Mispronunciation Detection and Diagnosis

Development of an articulatory visual-speech synthesizer to support language learning

Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training

Enunciate: An internet-accessible computer-aided pronunciation training system and related user evaluations

Transformer Based End-to-End Mispronunciation Detection and Diagnosis

Contact Info

Product

Resources

About