2018
DOI: 10.48550/arxiv.1805.01357
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Abstract: In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…Within U Net, those pooling and up-sampling operation along the spatial dimension make the receptive field increases exponentially and high way connection allows the combination of different scales features. Inspired by the success of U Net in image segmentation, U-shaped models have been proposed for various acoustic applications, e.g., denoise [14], audio source separation [15].…”
Section: Introductionmentioning
confidence: 99%
“…Within U Net, those pooling and up-sampling operation along the spatial dimension make the receptive field increases exponentially and high way connection allows the combination of different scales features. Inspired by the success of U Net in image segmentation, U-shaped models have been proposed for various acoustic applications, e.g., denoise [14], audio source separation [15].…”
Section: Introductionmentioning
confidence: 99%
“…The first one is using a separation frontend to enhance both training and test sets and retraining the acoustic model with enhanced features [12,13]. The second one is joint-training the front-end enhancement model with the back-end acoustic model [14,15]. The third one is multi-conditional training which performs acoustic modeling on noisy speech and the extracted features are directly fed to the acoustic model for decoding at the test stage.…”
Section: Introductionmentioning
confidence: 99%