2010
DOI: 10.1109/tasl.2009.2034770
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments

Abstract: Abstract-In the presence of environmental noise, speakers tend to adjust their speech production in an effort to preserve intelligible communication. The noise-induced speech adjustments, called Lombard effect (LE), are known to severely impact the accuracy of automatic speech recognition (ASR) systems. The reduced performance results from the mismatch between the ASR acoustic models trained typically on noise-clean neutral (modal) speech and the actual parameters of noisy LE speech. In this study, novel unsup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
43
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 70 publications
(44 citation statements)
references
References 49 publications
1
43
0
Order By: Relevance
“…Relevant examples include level and/or frequency equalization techniques which attempt to transform the speech-plus-noise signal so that its features mimic a set of reference features calculated in the temporal, spectral, or cepstral domains. One line of work equalizes the noisy input signal to reflect the characteristics of the clean speech used to train the ASR system (Hilger and Ney, 2006;Joshi et al, 2011), while other work has focused on undoing the characteristics of Lombard speech (Boril and Hansen, 2010). Other techniques involve more complex models of intelligibility with the explicit goal of enhancing some intelligibility metric and may operate on clean speech prior to the addition of noise (Chanda and Park, 2007).…”
Section: B Comparison With Other Methodsmentioning
confidence: 99%
“…Relevant examples include level and/or frequency equalization techniques which attempt to transform the speech-plus-noise signal so that its features mimic a set of reference features calculated in the temporal, spectral, or cepstral domains. One line of work equalizes the noisy input signal to reflect the characteristics of the clean speech used to train the ASR system (Hilger and Ney, 2006;Joshi et al, 2011), while other work has focused on undoing the characteristics of Lombard speech (Boril and Hansen, 2010). Other techniques involve more complex models of intelligibility with the explicit goal of enhancing some intelligibility metric and may operate on clean speech prior to the addition of noise (Chanda and Park, 2007).…”
Section: B Comparison With Other Methodsmentioning
confidence: 99%
“…Previous studies on robust ASR for stressed speech and Lombard effect speech have reported performance gains when altering configurations of the front-end feature extraction filter banks [26][27][28]. Inspired by [28], our first step is to replace the Mel and Bark filter banks (FB) in MFCC and PLP by a bank of triangular and rectangular filters uniformly distributed over a linear frequency axis.…”
Section: Modified Front-endsmentioning
confidence: 99%
“…Inspired by [28], our first step is to replace the Mel and Bark filter banks (FB) in MFCC and PLP by a bank of triangular and rectangular filters uniformly distributed over a linear frequency axis. In the case of the triangular bank, the band cutoffs are located at the center frequencies of the adjacent filters while the rectangular filters are stacked next to each other without overlap as in [27].…”
Section: Modified Front-endsmentioning
confidence: 99%
“…The fundamental frequency of speech (F 0 ) is known to be affected by stress [19,20], emotions [19,21], and talking styles [22]. Different languages may exhibit unique F 0 characteristics [23] and the same may be observed also for individual dialects of a language [24].…”
Section: Fundamental Frequency Analysismentioning
confidence: 99%