Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Kolossa, Dorothea; Astudillo, Ramón Fernandez; Hoffmann, Eugen; Orglmeister, Reinhold

doi:10.1155/2010/651420

Cited by 23 publications

(42 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the estimates of the source images y 1j,n are complex-valued Gaussian, the magnitude spectrum (i.e., the absolute value of the source images) follows a Rice distribution [38]. For the second non-linearity, we assume the log-normality of the Mel features and use the lognormal transform given in [46].…”

Section: ) Moment Matchingmentioning

confidence: 99%

“…This approach was introduced for noise-robust automatic speech recognition [36]- [42] and it has also been used for noiserobust speaker identification [43], [44] and singer identification in polyphonic music [45]. While there exist techniques to propagate uncertainty from the separated signal to the features based on moment matching [46], unscented transform [38], or Vector Taylor series (VTS) [47], the estimation of uncertainty on the separated signal remains a difficult problem. A heuristic is to assume that the uncertainty is proportional to the squared difference between the separated target and the mixture in the time-frequency domain [38].…”

Section: Introductionmentioning

confidence: 99%

“…While there exist techniques to propagate uncertainty from the separated signal to the features based on moment matching [46], unscented transform [38], or Vector Taylor series (VTS) [47], the estimation of uncertainty on the separated signal remains a difficult problem. A heuristic is to assume that the uncertainty is proportional to the squared difference between the separated target and the mixture in the time-frequency domain [38]. In [40], [41], [48], more principled uncertainty estimators were proposed whose mean and variance are derived from ML estimates of the parameters of the source models.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Variational Bayesian Inference for Source Separation and Robust Feature Extraction

Adiloğlu

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

We consider the task of separating and classifying individual sound sources mixed together. The main challenge is to achieve robust classification despite residual distortion of the separated source signals. A promising paradigm is to estimate the uncertainty about the separated source signals and to propagate it through the subsequent feature extraction and classification stages. We argue that variational Bayesian (VB) inference offers a mathematically rigorous way of deriving uncertainty estimators, which contrasts with state-of-theart estimators based on heuristics or on maximum likelihood (ML) estimation. We propose a general VB source separation algorithm, which makes it possible to jointly exploit spatial and spectral models of the sources. This algorithm achieves 6% and 5% relative error reduction compared to ML uncertainty estimation on the CHiME noise-robust speaker identification and speech recognition benchmarks, respectively, and it opens the way for more complex VB approximations of uncertainty.

show abstract

Section: ) Moment Matchingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Variational Bayesian Inference for Source Separation and Robust Feature Extraction

Adiloğlu

2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Robust ASR approaches [1] may be classified as model compensation [2], feature compensation [3] or hybrid techniques [4][5][6]. Uncertainty decoding [7][8][9][10][11][12][13][14] has emerged as a promising hybrid technique whereby speech enhancement is applied to the input noisy signal and the enhanced features are not considered as point estimates but as a Gaussian distribution with timevarying variance or uncertainty that is used to dynamically adapt the acoustic model on each time frame for decoding. Uncertainty decoding may be used with feature-domain or spectral-domain enhancement.…”

Section: Introductionmentioning

confidence: 99%

“…We adopt the latter approach, as it benefits from multichannel information and it has led to the best ASR accuracy in a real domestic environment as evaluated by the CHiME Challenge [15]. Following [9,10,13], we estimate the uncertainty in the spectral domain and we subsequently propagate it to the feature domain.…”

Section: Introductionmentioning

confidence: 99%

Fusion of multiple uncertainty estimators and propagators for noise robust ASR

Trung

Jouvet

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Uncertainty decoding has been successfully used for speech recognition in highly nonstationary noise environments. Yet, accurate estimation of the uncertainty on the denoised signals and propagation to the features remain difficult. In this work, we propose to fuse the uncertainty estimates obtained from different uncertainty estimators and propagators by linear combination. The fusion coefficients are optimized by minimizing a measure of divergence with oracle estimates on development data. Using the Kullback-Leibler divergence, we obtain 18% relative error rate reduction on the 2nd CHiME Challenge with respect to conventional decoding, that is about twice as much as the reduction achieved by the best single uncertainty estimator and propagator.

show abstract

Uncertainty-based learning of acoustic models from noisy data

Ozerov¹,

Lagrange²

2013

Computer Speech & Language

View full text Add to dashboard Cite

Revised version including a bugfix in the computation of the Wiener uncertainty estimator and in the corresponding numerical results in Tables 1, 2, 3, E.6, E.7 and in Figure 6 compared to the original version published by Elsevier.International audienceWe consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new Expectation Maximisation (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 1% to 2% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio

show abstract

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Cited by 23 publications

References 10 publications

Variational Bayesian Inference for Source Separation and Robust Feature Extraction

Variational Bayesian Inference for Source Separation and Robust Feature Extraction

Fusion of multiple uncertainty estimators and propagators for noise robust ASR

Uncertainty-based learning of acoustic models from noisy data

Contact Info

Product

Resources

About