2010
DOI: 10.1155/2010/651420
|View full text |Cite
|
Sign up to set email alerts
|

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 23 publications
(42 citation statements)
references
References 10 publications
0
42
0
Order By: Relevance
“…As the estimates of the source images y 1j,n are complex-valued Gaussian, the magnitude spectrum (i.e., the absolute value of the source images) follows a Rice distribution [38]. For the second non-linearity, we assume the log-normality of the Mel features and use the lognormal transform given in [46].…”
Section: ) Moment Matchingmentioning
confidence: 99%
See 2 more Smart Citations
“…As the estimates of the source images y 1j,n are complex-valued Gaussian, the magnitude spectrum (i.e., the absolute value of the source images) follows a Rice distribution [38]. For the second non-linearity, we assume the log-normality of the Mel features and use the lognormal transform given in [46].…”
Section: ) Moment Matchingmentioning
confidence: 99%
“…This approach was introduced for noise-robust automatic speech recognition [36]- [42] and it has also been used for noiserobust speaker identification [43], [44] and singer identification in polyphonic music [45]. While there exist techniques to propagate uncertainty from the separated signal to the features based on moment matching [46], unscented transform [38], or Vector Taylor series (VTS) [47], the estimation of uncertainty on the separated signal remains a difficult problem. A heuristic is to assume that the uncertainty is proportional to the squared difference between the separated target and the mixture in the time-frequency domain [38].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Robust ASR approaches [1] may be classified as model compensation [2], feature compensation [3] or hybrid techniques [4][5][6]. Uncertainty decoding [7][8][9][10][11][12][13][14] has emerged as a promising hybrid technique whereby speech enhancement is applied to the input noisy signal and the enhanced features are not considered as point estimates but as a Gaussian distribution with timevarying variance or uncertainty that is used to dynamically adapt the acoustic model on each time frame for decoding. Uncertainty decoding may be used with feature-domain or spectral-domain enhancement.…”
Section: Introductionmentioning
confidence: 99%
“…We adopt the latter approach, as it benefits from multichannel information and it has led to the best ASR accuracy in a real domestic environment as evaluated by the CHiME Challenge [15]. Following [9,10,13], we estimate the uncertainty in the spectral domain and we subsequently propagate it to the feature domain.…”
Section: Introductionmentioning
confidence: 99%