2023
DOI: 10.3390/s23042053
|View full text |Cite
|
Sign up to set email alerts
|

Improving Speech Recognition Performance in Noisy Environments by Enhancing Lip Reading Accuracy

Abstract: The current accuracy of speech recognition has been able to reach over 97% on different data sets, but the accuracy of speech recognition in noisy environments is greatly reduced. Improving speech recognition performance in noisy environments is a challenging task. Due to the fact that visual information is not affected by noise, researchers often use lip information to help improve speech recognition performance. This is where the performance of lip reading and the effect of cross-modal fusion are particularl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 33 publications
0
1
0
Order By: Relevance
“…For example, individual differences in co-articulation may underlie difficulties in the transfer of training of visual speech recognition (Bear & Harvey, 2017). Future interactions between psychologists and computer scientists studying multimodal speech recognition could facilitate identification of specific targets for training people and/or computers, perhaps even leading to a new generation of 'smart' hearing aids that use lipreading to enhance automatic speech recognition in noisy environments (e.g., Li et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…For example, individual differences in co-articulation may underlie difficulties in the transfer of training of visual speech recognition (Bear & Harvey, 2017). Future interactions between psychologists and computer scientists studying multimodal speech recognition could facilitate identification of specific targets for training people and/or computers, perhaps even leading to a new generation of 'smart' hearing aids that use lipreading to enhance automatic speech recognition in noisy environments (e.g., Li et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“… Forensics: lip-reading can be used to reconstruct the dialogues in a footage where the audio has been lost or it is noisy.  Automated Speech Recognition [1]: automakers can integrate lip-reading systems to complement their ASR model in order to understand commands (for example "turn on the A/C") from the driver or the passengers in smart cars when the music's volume is too high. Lip-reading is also necessary in this case to recognize the active speaker in the scene.…”
Section: Introductionmentioning
confidence: 99%
“…Multimodal deep learning has emerged as a powerful approach for various tasks by combining information from different modalities, exploiting their complementary nature, and enhancing their overall performance [1][2][3][4]. In the realm of speaker recognition, incorporating multiple features, such as lip movements, depth images, and voice, can lead to improved accuracy and robustness in applications such as security systems, access control, and surveillance [2,[5][6][7][8][9].…”
Section: Introductionmentioning
confidence: 99%