Proceedings of the 2nd Workshop on Representation Learning for NLP 2017
DOI: 10.18653/v1/w17-2620
|View full text |Cite
|
Sign up to set email alerts
|

Transfer Learning for Speech Recognition on a Budget

Abstract: End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring les… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
55
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 105 publications
(57 citation statements)
references
References 16 publications
(17 reference statements)
1
55
0
Order By: Relevance
“…In natural language processing (NLP), unsupervised pre-training of language models (Devlin et al, 2018;Radford et al, 2018; improved many tasks such as text classification, phrase structure parsing and machine translation Lample & Conneau, 2019). In speech processing, pre-training has focused on emotion recogniton (Lian et al, 2018), speaker identification , phoneme discrimination (Synnaeve & Dupoux, 2016a;van den Oord et al, 2018) as well as transferring ASR representations from one language to another (Kunze et al, 2017). There has been work on unsupervised learning for speech but the resulting representations have not been applied to improve supervised speech recognition (Synnaeve & Dupoux, 2016b;Kamper et al, 2017;Chung et al, 2018;Chen et al, 2018;Chorowski et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…In natural language processing (NLP), unsupervised pre-training of language models (Devlin et al, 2018;Radford et al, 2018; improved many tasks such as text classification, phrase structure parsing and machine translation Lample & Conneau, 2019). In speech processing, pre-training has focused on emotion recogniton (Lian et al, 2018), speaker identification , phoneme discrimination (Synnaeve & Dupoux, 2016a;van den Oord et al, 2018) as well as transferring ASR representations from one language to another (Kunze et al, 2017). There has been work on unsupervised learning for speech but the resulting representations have not been applied to improve supervised speech recognition (Synnaeve & Dupoux, 2016b;Kamper et al, 2017;Chung et al, 2018;Chen et al, 2018;Chorowski et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…However, because speech signals are high-dimensional and highly variable even for a single speaker, training deep models and learning these hierarchical representations without a large amount of training data is difficult. The computer vision [15,16], natural language processing [17][18][19][20][21], and ASR [22][23][24][25] communities have attacked the problem of limited supervised training data with great success by pre-training deep models on related tasks for which there is more training data. Following their lead, we propose an efficient ASR-based pre-training methodology in this paper and show that it may be used to improve the performance of end-toend SLU models, especially when the amount of training data is very small.…”
Section: Introductionmentioning
confidence: 99%
“…Out of them, fewshot techniques as proposed by [19,5] have become really popular. Pons et al [17] proposed a few-shot technique using prototypical networks [5] and transfer leaning [20,21] to solve a different audio task.…”
Section: Related Workmentioning
confidence: 99%