Yu Gao scite author profile

Yu Gao

1Publication

1Citation Statement Received

42Citation Statements Given

How they've been cited

How they cite others

Affiliations

Jianghan University

Publications

Order By: Most citations

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy

Gao

Zhu

et al. 2023

Sensors

View full text Add to dashboard Cite

The current accuracy of speech recognition has been able to reach over 97% on different data sets, but the accuracy of speech recognition in noisy environments is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help improve speech recognition performance. This is where the performance of lip reading and the effect of cross-modal fusion are particularly important. In this paper, we try to improve the accuracy of speech recognition in noisy environments by improving the lip reading performance and the cross-modal fusion effect. First, due to the same lip may contain multiple meanings, we construct a one-to-many mapping relationship model between lips and speech, allowing the lip-reading model to consider the feasibility of which articulations are represented from the input lip movements. Also, audio representations are preserved by modeling the inter-relationships between paired audio-visual representations. At the inference stage, the preserved audio representations can be extracted from memory by the learned interrelationships using only video input. Second, a joint cross-fusion model using the attention mechanism can effectively exploit complementary inter-modal relationships, and the model calculates cross-attention weights based on the correlations between joint feature representations and individual modalities. Finally, our proposed model has a 4.0% reduction in WER in −15 dB SNR environment compared to the baseline method, and a 10.1% reduction in WER compared to speech recognition. The experimental results show that our method has a significant improvement over speech recognition models in different noise environments.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yu Gao

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy

Contact Info

Product

Resources

About