Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1386
|View full text |Cite
|
Sign up to set email alerts
|

Two-Stage Data Augmentation for Low-Resourced Speech Recognition

Abstract: Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, whe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(16 citation statements)
references
References 22 publications
0
16
0
Order By: Relevance
“…To address the lack of training data, however, a number of modifications have been made to the standard pipeline. These approaches include data augmentation [23,5,14]; the use of web data [20]; extensive system combination [31]; and the use of multiple languages [4,24]. Furthermore the concept of low resource can be applied beyond the availability of training data to include linguistic resources such as an accurate lexicon.…”
Section: Low-resource Speech Recognitionmentioning
confidence: 99%
“…To address the lack of training data, however, a number of modifications have been made to the standard pipeline. These approaches include data augmentation [23,5,14]; the use of web data [20]; extensive system combination [31]; and the use of multiple languages [4,24]. Furthermore the concept of low resource can be applied beyond the availability of training data to include linguistic resources such as an accurate lexicon.…”
Section: Low-resource Speech Recognitionmentioning
confidence: 99%
“…In [2] data augmentation by creating multiple versions of the original signal with various speed factors has been shown to improve ASR performance across various tasks. This approach is used more elaborately in [3] where in addition to data augmentation at the signal level with noise addition and speed perturbation, data is augmented in a second stage, at the feature level with a fMLLR-based technique applied to bottleneck features. Similarly in [4], data augmentation is performed at the feature level via vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM), to improve ASR performance in low resource settings.…”
Section: Related Workmentioning
confidence: 99%
“…Following an approach introduced in Ko et al [2] and used by Hartmann et al [3], we perform speed perturbation to generate modified copies of the source audio. This approach modifies the speed of each file by a multiplicative factor drawn at random uniformly between 0.9 and 1.1.…”
Section: Speed Modificationmentioning
confidence: 99%
“…The amounts of their training and test data are listed in Table 1. For each language the training data is doubled by one copy of augmentation created with varying speeds and adding Babel noises, as described in [21]. In this paper we report STT system performance (in terms of WER -word error rate) measured on the test sets given in Table 1.…”
Section: Training Tdnn-hmm Hybrid Systemsmentioning
confidence: 99%