2007
DOI: 10.1109/tasl.2007.901836
|View full text |Cite
|
Sign up to set email alerts
|

Transforming Binary Uncertainties for Robust Speech Recognition

Abstract: Abstract-Recently, several algorithms have been proposed to enhance noisy speech by estimating a binary mask that can be used to select those time-frequency regions of a noisy speech signal that contain more speech energy than noise energy. This binary mask encodes the uncertainty associated with enhanced speech in the linear spectral domain. The use of the cepstral transformation smears the information from the noise dominant time-frequency regions across all the cepstral features. We propose a supervised app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
73
0

Year Published

2008
2008
2019
2019

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 61 publications
(73 citation statements)
references
References 35 publications
0
73
0
Order By: Relevance
“…We also performed reconstruction using a method described in [10] using a 1024-component diagonal Gaussian mixture model of clean speech. It was noted in our earlier work that this reconstruction method does not outperform direct masking when a bottom-up mask is used.…”
Section: Evaluation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also performed reconstruction using a method described in [10] using a 1024-component diagonal Gaussian mixture model of clean speech. It was noted in our earlier work that this reconstruction method does not outperform direct masking when a bottom-up mask is used.…”
Section: Evaluation Resultsmentioning
confidence: 99%
“…Such methods can be broadly categorized into three groups: 1) extracting robust features like PLP, RASTA [1], and AFE [2], 2) model adaptation techniques like MLLR [3], PMC [4], and Vector Taylor series (VTS) based adaptation [5], and 3) noise suppression or feature enhancement techniques like Wiener filtering [6], VTS-based enhancement [7], and model based feature enhancement [8]. There are also systems that combine the above methods [9,10]. Because of the huge variability in noise in real-life conditions, the level of robustness obtained by these methods is still inadequate.…”
Section: Introductionmentioning
confidence: 99%
“…Specifically, we employ a CASA system [24] to segregate speech from noise and obtain a binary mask that indicates reliable or corrupted components of an auditory feature. The auditory feature is then enhanced by reconstructing the corrupted components [18,24,25].…”
Section: Casa-based Robust Speaker Recognitionmentioning
confidence: 99%
“…Additionally, we estimate reconstruction uncertainties [24,25] and apply them in an uncertainty decoder [6] to calculate speaker likelihoods. This decoder accounts for varied accuracies of the feature enhancement process.…”
Section: Casa-based Robust Speaker Recognitionmentioning
confidence: 99%
See 1 more Smart Citation