Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1569
|View full text |Cite
|
Sign up to set email alerts
|

Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…To analyze OOV word recognition performance, we used an F-score metric similar to [ 22 ]. The method based on counting after decoding how many times the model emitted (true positive, ) or did not emit (false negative, ) the OOV words from the evaluation set.…”
Section: Methods Descriptionmentioning
confidence: 99%
See 1 more Smart Citation
“…To analyze OOV word recognition performance, we used an F-score metric similar to [ 22 ]. The method based on counting after decoding how many times the model emitted (true positive, ) or did not emit (false negative, ) the OOV words from the evaluation set.…”
Section: Methods Descriptionmentioning
confidence: 99%
“…There are few previous works on ASR related to the investigation of subword augmentation by non-deterministic segmentation. The vanilla subword regularization was studied in [ 21 , 22 ]. In the first work, the method was applied for the WSJ dataset (English, 50 h).…”
Section: Introductionmentioning
confidence: 99%
“…The method helped prevent over-fitted and over-confident models, and it could distinguish plausible target words from incorrect ones. Subwords are the most popular applied output units in an E2E ASR system [84]. The researchers in [81] tested subword regularization with both CTC-based and attention-based ASR models.…”
Section: ) Vocabularymentioning
confidence: 99%
“…They also showed that uniform greedy sampling of subword units, which is much faster than LSD, was also an effective decomposition strategy when combined with n-gram loss. In [84], the researchers investigated the regularizing influence of the subword segmentation sampling approach on a streaming task of E2E ASR. They evaluated the contribution of subword regularization that relied on the training dataset size, and the results suggested that subword regularization provided a consistent reduction of WER.…”
Section: ) Vocabularymentioning
confidence: 99%
“…In addition, many works try to leverage multi-modeling units to jointly optimize the E2E ASR. Lakomkin et al [17] point out that combining several segmentations of an utterance transcription in the loss function to optimize the E2E ASR model may be beneficial to the model. Krishna et al [18] proposes phoneme and word-piece CTC loss to joint learning based on BiLSTM model.…”
Section: Introductionmentioning
confidence: 99%