Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2191
|View full text |Cite
|
Sign up to set email alerts
|

An Investigation of Mixup Training Strategies for Acoustic Models in ASR

Abstract: Mixup is a recently proposed technique that creates virtual training examples by combining existing ones. It has been successfully used in various machine learning tasks. This paper focuses on applying mixup to automatic speech recognition (ASR). More specifically, several strategies for acoustic model training are investigated, including both conventional cross-entropy and novel lattice-free MMI models. Considering mixup as a method of data augmentation as well as regularization, we compare it with widely use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 33 publications
(17 citation statements)
references
References 20 publications
(29 reference statements)
0
17
0
Order By: Relevance
“…39phone and 84phone means the number of phones we used to training the model. Attention and mix-up [22] are the method when we trained the model. From the table 2, we can see that the performances of different networks are similar.…”
Section: Acoustic Modelsmentioning
confidence: 99%
“…39phone and 84phone means the number of phones we used to training the model. Attention and mix-up [22] are the method when we trained the model. From the table 2, we can see that the performances of different networks are similar.…”
Section: Acoustic Modelsmentioning
confidence: 99%
“…This approach performs on-the-fly generation of virtual training examples by combining the existing ones. We used Kaldi-compatible implementation 2 described in our previous paper [29]. Figure 1 presents the preparation of 3 datasets D1, D2, and D3, used for training of the acoustic models.…”
Section: Signal Enhancement and Data Augmentationmentioning
confidence: 99%
“…• AM1: Time Delay Neural Network (TDNN) [33] trained with LF-MMI criterion [34] on D1 dataset and SDBN features. Training was performed with mixup data augmentation [29] and backstitch regularization [35].…”
Section: Final Acoustic Modelsmentioning
confidence: 99%
“…Weights for the combination are proportional to ξ α and (1 − ξ) α , where ξ and (1 − ξ) are the same weights that are applied for the corresponding feature vectors, and α is a scaling factor 1 . Other ways to apply mixup for ASR can be found in [39].…”
Section: Generate a New Synthetic Sequence Of Input Vectors Asmentioning
confidence: 99%