An Overview of Noise-Robust Automatic Speech Recognition

Li, Jinyu; Deng, Li; Gong, Yifan; Haeb‐Umbach, Reinhold

doi:10.1109/taslp.2014.2304637

Cited by 486 publications

(240 citation statements)

References 209 publications

Supporting

Mentioning

239

Contrasting

Unclassified

Order By: Relevance

“…More recently, Li et al (2014) reviewed new techniques that may resolve the issue of sensitivity to noise in voice-controlled systems, and that may soon be implemented in commercially available vehicle systems. With sight of such development, the present study aims to compare a noise-sensitive system that degrades in accuracy due to the presence of background noise to a noise-robust one in terms of user experience and driving performance.…”

Section: Introductionmentioning

confidence: 99%

Voice-Controlled In-Vehicle Systems: Effects of Voice-Recognition Accuracy in the Presence of Background Noise

Sokol

Chen

Donmez

2017

Proceedings of the 9th International Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design: Dri

View full text Add to dashboard Cite

Summary: This paper presents initial findings from a driving simulator study comparing user responses to a noise-robust voice-controlled system while driving to a noise-sensitive one in the presence of background noise. Twenty participants interacted with both noise-sensitive and noise-robust simulated voice-controlled infotainment systems while driving under three background noise conditions (no noise, music, and children). While both systems were viewed as useful and satisfying, user acceptance was affected by background noise with the noisesensitive system, but not the noise-robust one. There was also no evidence that user acceptance was calibrated by having background noise as a context for varying levels of accuracy. No significant differences were observed between the two systems in driving performance metrics analyzed (average speed, speed variability, and standard deviation of lane position), but the use of either system affected driving performance compared to baseline driving. A larger sample size at the end of this study along with the analysis of a larger set of performance metrics will provide further insights.

show abstract

Section: Introductionmentioning

confidence: 99%

Voice-Controlled In-Vehicle Systems: Effects of Voice-Recognition Accuracy in the Presence of Background Noise

Sokol

Chen

Donmez

2017

Proceedings of the 9th International Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design: Dri

View full text Add to dashboard Cite

show abstract

“…However, modern recognition systems suffer from severe performance degradation in the presence of unavoidable interrupting factors like environment noise, room reverberation, disturbances from different microphones and recording non-linearities [1]. To solve these problems, many processing techniques [2,3,4], including speech enhancement algorithms [5] and new robust acoustic features [6] [7], have been developed to improve recognition performance under low signal-to-noise ratio (SNR) conditions. However these existing approaches, while achieving some improvements, are far from being a comprehensive solution.…”

Section: Introductionmentioning

confidence: 99%

Deep neural network for robust speech recognition with auxiliary features from laser-Doppler vibrometer sensor

Xie

McLoughlin

et al. 2016

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

View full text Add to dashboard Cite

Recently, the signal captured from a laser Doppler vibrometer (LDV) sensor been used to improve the noise robustness automatic speech recognition (ASR) systems by enhancing the acoustic signal prior to feature extraction. This study proposes another approach in which auxiliary features extracted from the LDV signal are used alongside conventional acoustic features to further improve ASR performance based on the use of a deep neural network (DNN) as the acoustic model. While this approach is promising, the best training data sets for ASR do not include LDV data in parallel with the acoustic signal. Thus, to leverage such existing large-scale speech databases, a regression DNN is designed to map acoustic features to LDV features. This regression DNN is well trained from a limited size parallel signal data set, then used to form pseudo-LDV features from a massive speech data set for parallel training of an ASR system. Our experiments show that both the features from the limited scale LDV data set as well as the massive scale pseudo-LDV features are able to train an ASR system that significantly outperforms one using acoustic features alone, in both quiet and noisy environments.

show abstract

“…Each topic had a mean length of 18 words, 55 percent of which were identified as keywords on average. 57 The study was a within-subjects, single-session design with three conditions: Dynamic Screen Display vs. Google Glass vs. Control. Each session lasted approximately 90 minutes.…”

Section: Methodsmentioning

confidence: 99%

“…Speech corpora generally have low diversity of speakers, therefore acoustic models generated from them might be inaccurate for transcribing speech input from nonnative speakers, speakers with accents, speakers affected with speech impairments [10], or others underrepresented in the corpora, such as older adults and children. Also, recording factors such as noise and other audio distortions can result in lower ASR performance [57].…”

Section: Acoustic Modelmentioning

confidence: 99%

See 1 more Smart Citation

Speech-based real-time presentation tracking using semantic matching

Asadi¹

View full text Add to dashboard Cite

Oral presentations are an essential yet challenging aspect of academic and professional life. To date, many commercial and research products have been developed to provide support for the authoring, rehearsal and delivery of presentations. However, little work has been conducted to provide real-time tracking of a speaker's presentation relative to their supporting media. Given the content of presentation slides and speaking notes, a presentation tracking system uses automatic

show abstract

An Overview of Noise-Robust Automatic Speech Recognition

Cited by 486 publications

References 209 publications

Voice-Controlled In-Vehicle Systems: Effects of Voice-Recognition Accuracy in the Presence of Background Noise

Voice-Controlled In-Vehicle Systems: Effects of Voice-Recognition Accuracy in the Presence of Background Noise

Deep neural network for robust speech recognition with auxiliary features from laser-Doppler vibrometer sensor

Speech-based real-time presentation tracking using semantic matching

Contact Info

Product

Resources

About