2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011
DOI: 10.1109/icassp.2011.5947347
|View full text |Cite
|
Sign up to set email alerts
|

UT-Scope: Towards LVCSR under Lombard effect induced by varying types and levels of noisy background

Abstract: Adverse environments impact the performance of automatic speech recognition systems in two ways -directly by introducing acoustic mismatch between the speech signal and acoustic models, and indirectly by affecting the way speakers communicate to maintain intelligible communication over noise (Lombard effect). Currently, an increasing number of studies have analyzed Lombard effect with respect to speech production and perception, yet limited attention has been paid to its impact on speech systems, especially wi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 16 publications
0
15
0
Order By: Relevance
“…Figure 3 demonstrates performance of a neutral speech-trained hidden Markov model (HMM) ASR system when tested on TIMITlike [32] utterances produced by speakers that were exposed to three levels of a highway (HWY), large crowd (CRD/LCR), and pink noise (PNK) played back through headhpones (70, 80, and 90 dB SPL for HWY and CRD; 65, 75, 85 dB SPL for PNK). A close-talk microphone channel providing high SNR recordings was used in the ASR experiment on 31 US-born subjects' (25 females, 6 males) drawn from the UT-Scope Lombard Effect set [9] (see [33] for more details on the ASR experiment). It can be seen that the word error rate (WER) grows rapidly from the baseline no-noise Neutral condition once the speakers are exposed to increasing noise levels -while the recorded speech signal retains a high SNR.…”
Section: Adding Noise Versus Talking In Noisementioning
confidence: 99%
“…Figure 3 demonstrates performance of a neutral speech-trained hidden Markov model (HMM) ASR system when tested on TIMITlike [32] utterances produced by speakers that were exposed to three levels of a highway (HWY), large crowd (CRD/LCR), and pink noise (PNK) played back through headhpones (70, 80, and 90 dB SPL for HWY and CRD; 65, 75, 85 dB SPL for PNK). A close-talk microphone channel providing high SNR recordings was used in the ASR experiment on 31 US-born subjects' (25 females, 6 males) drawn from the UT-Scope Lombard Effect set [9] (see [33] for more details on the ASR experiment). It can be seen that the word error rate (WER) grows rapidly from the baseline no-noise Neutral condition once the speakers are exposed to increasing noise levels -while the recorded speech signal retains a high SNR.…”
Section: Adding Noise Versus Talking In Noisementioning
confidence: 99%
“…In our recent study, RASTALP -a modified RASTA filter that approximates the low-pass portion of the original RASTA by a smoothing low pass filter [9] and the high-pass portion by CMN or other segmentbased normalizations [9,18] was introduced. Compared to the original high order band-pass RASTA filter, RASTALP is a filter of significantly lower (2 nd ) order, which helps reduce transient effects typical for RASTA filtering.…”
Section: Temporal Filteringmentioning
confidence: 99%
“…Several studies have considered individual effects of room reverberation [2,3,[5][6][7] and increased vocal effort [8,9] on ASR, and reported compensation strategies for their alleviation. However, to the best of our knowledge, this study is the first to consider the individual as well as the combined effects of reverberation and increased vocal effort on ASR.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We utilized two different acoustic features: (a) Mel frequency Cepstral coefficients (MFCC), MFCCs are normalized by quantile cepstral normalization (QCN) [17] and low-pass RASTA filtering [18], and (b) Rectangular Filter-bank Cepstral Coefficients (RFCC) [17], which are processed through feature warping [19]. All features are 39-dimensional (12 cepstral coefficients+C 0 +∆+∆∆).…”
Section: Front-end Processingmentioning
confidence: 99%