Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-711
|View full text |Cite
|
Sign up to set email alerts
|

Audio augmentation for speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
379
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 813 publications
(410 citation statements)
references
References 11 publications
1
379
0
Order By: Relevance
“…We extracted 80-channel log-mel filterbank coefficients computed with a 25-ms window that was shifted every 10ms with Kaldi [19]. We applied three-fold speed perturbation [20] and SpecAugment [21]. The vocabularies were constructed by the byte pair encoding (BPE) algorithm [22] with 10k and 1k units for AED and RNN-T models, respectively.…”
Section: Experimental Evaluations 41 Experimental Setupmentioning
confidence: 99%
“…We extracted 80-channel log-mel filterbank coefficients computed with a 25-ms window that was shifted every 10ms with Kaldi [19]. We applied three-fold speed perturbation [20] and SpecAugment [21]. The vocabularies were constructed by the byte pair encoding (BPE) algorithm [22] with 10k and 1k units for AED and RNN-T models, respectively.…”
Section: Experimental Evaluations 41 Experimental Setupmentioning
confidence: 99%
“…As an upperbound experiment, we compare with a supervised-only setting where we have 650 hours for UK English and 3700 hours for Italian. The supervised data is augmented 3x with speed perturbation [24]. For evaluation, we use a 14 hour test set for UK English and a 20 hour test set for Italian.…”
Section: Datamentioning
confidence: 99%
“…We used Must-C (Di Gangi et al, 2019), Must-C v2 2 , ST-TED (Jan et al, 2018), Librispeech (Panayotov et al, 2015), and TEDLIUM2 (Rousseau et al, 2012) corpora. We used the cleaned version of ST-TED following data was augmented by three-fold speed perturbation (Ko et al, 2015) with speed ratios of 0.9, 1.0, and 1.1 except for Librispeech. We removed case information and punctuation marks except for apostrophes from the transcripts.…”
Section: Asrmentioning
confidence: 99%