2015
DOI: 10.1186/s13636-014-0047-0
|View full text |Cite
|
Sign up to set email alerts
|

Noisy training for deep neural networks in speech recognition

Abstract: Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generaliza… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
32
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 87 publications
(36 citation statements)
references
References 37 publications
2
32
0
2
Order By: Relevance
“…However, as is predictable, a too high value, as in the case of γ = 4, limits improvements due to exaggerated scaling. Finally, it is interesting to notice the effect of the inclusion of noise: the negligible deficit to the average performance it produces is counterbalanced by an almost halved standard deviation; which seems to confirm the results presented in [17]- [19], where the main contribution of injecting noise into the training inputs was claimed to be an increased ability of the network to learn more consistent results. Configuration n. 6, i.e., the one having all hyperparameters excluded, performs the worst, confirming once more the positive effect all proposed hyperparameters have on performance.…”
Section: Evaluation On Test Setsupporting
confidence: 76%
“…However, as is predictable, a too high value, as in the case of γ = 4, limits improvements due to exaggerated scaling. Finally, it is interesting to notice the effect of the inclusion of noise: the negligible deficit to the average performance it produces is counterbalanced by an almost halved standard deviation; which seems to confirm the results presented in [17]- [19], where the main contribution of injecting noise into the training inputs was claimed to be an increased ability of the network to learn more consistent results. Configuration n. 6, i.e., the one having all hyperparameters excluded, performs the worst, confirming once more the positive effect all proposed hyperparameters have on performance.…”
Section: Evaluation On Test Setsupporting
confidence: 76%
“…Then we convolve each SRIR with a randomly selected one-second clean speech sample from the Libri ASR corpus [20] to generate realistic reverberant speech recordings in Ambisonic format. Babble and speech shaped noise [21] are added to the convolved sound at signal-to-noise ratios (SNRs) following a normal distribution centered at 15dB with a standard deviation of 1dB as recommended by [22]. A short time fourier transform (STFT) is used to convert speech waveforms to spectrogram, and the features are extracted according to Section 3.2.…”
Section: Data Preparationmentioning
confidence: 99%
“…Converse to previous NN approaches for classification of ECG signals, the system proposed trains with clean and noisy data [55]. By using inputs corrupted with randomly sampled noises and various signal-tonoise ratios, we were able to build a robust classifier without an adaptive filter, because the injected noise improves the generalization capability of the NN model [56]. The rationale of this approach is that the perturbation introduced in training by the injected noise can be learned by the NN structure and recognized in the test phase.…”
Section: Discussionmentioning
confidence: 99%