2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2015
DOI: 10.1109/asru.2015.7404826
|View full text |Cite
|
Sign up to set email alerts
|

Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 8 publications
0
14
1
Order By: Relevance
“…Augmenting the training set by using individual microphone channels or performing semi-supervised adaptation on the test data also yielded consistent improvements on real and simulated data (Yoshioka et al, 2015). By contrast with these results, Vu et al (2015) found that, in the case when MVDR beamforming is applied, training the ASR backend on real data only improves the WER on real test data compared to training on real and simulated data. This confirms that the difference in the characteristics of enhanced real vs. simulated signals induced by MVDR carries over to the ASR backend too.…”
Section: Acoustic Modelingcontrasting
confidence: 48%
See 2 more Smart Citations
“…Augmenting the training set by using individual microphone channels or performing semi-supervised adaptation on the test data also yielded consistent improvements on real and simulated data (Yoshioka et al, 2015). By contrast with these results, Vu et al (2015) found that, in the case when MVDR beamforming is applied, training the ASR backend on real data only improves the WER on real test data compared to training on real and simulated data. This confirms that the difference in the characteristics of enhanced real vs. simulated signals induced by MVDR carries over to the ASR backend too.…”
Section: Acoustic Modelingcontrasting
confidence: 48%
“…As expected, Bagchi et al (2015) and Fujita et al (2015) reported similar performance for these two techniques on real and simulated data. Single-channel enhancement based on nonnegative matrix factorization (NMF) of the power spectra of speech and noise has also been used and resulted in minor improvement on both real and simulated data (Bagchi et al, 2015;Vu et al, 2015).…”
Section: Source Separationmentioning
confidence: 99%
See 1 more Smart Citation
“…Vu et al (2015) and Baby et al (2015) employ non-negative matrix factorisation approaches that exploit spectral diversity of speech and noise. Bagchi et al (2015) employ a DNN-based denoising autoencoder and El-Desoky Mousa et al (2015) perform feature denoising using a bidirectional Long Short-Term Memory (BLSTM).…”
Section: Target Enhancementmentioning
confidence: 99%
“…The simplest approach has been to apply utterance-based feature mean and variance normalization (Zhao et al, 2015;Fujita et al, 2015;Du et al, 2015;Wang et al, 2015). However, the two most effective techniques are transforming the DNN features using feature-space maximum likelihood linear regression (fMLLR) (Hori et al, 2015;Moritz et al, 2015;Vu et al, 2015;Sivasankaran et al, 2015;Tran et al, unpublished) or augmentation of the DNN features using either i-vectors, (e.g., Moritz et al, 2015;Zhuang et al, 2015), pitch-based features (Ma et al, 2015;Wang et al, 2015;Du et al, 2015) or bottleneck features (Tachioka et al, 2015), i.e., extracted from bottleneck layers in speaker classification DNNs. Where i-vectors have been used they may be either per-speaker (e.g., Prudnikov et al, 2015) or per-speaker-environment, (e.g.…”
Section: Feature Designmentioning
confidence: 99%