Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1160
|View full text |Cite
|
Sign up to set email alerts
|

Improving End-to-End Speech-to-Intent Classification with Reptile

Abstract: End-to-end spoken language understanding (SLU) systems have many advantages over conventional pipeline systems, but collecting in-domain speech data to train an end-to-end system is costly and time consuming. One question arises from this: how to train an end-to-end SLU with limited amounts of data? Many researchers have explored approaches that make use of other related data resources, typically by pre-training parts of the model on high-resource speech recognition. In this paper, we suggest improving the gen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(11 citation statements)
references
References 15 publications
(27 reference statements)
0
11
0
Order By: Relevance
“…Meta-learning has been applied in computer vision research and achieved great success in image classification [5,6,7,8]. Meanwhile, several natural language and speech processing tasks also adopt meta-learning and attain promising results, such as neural machine translation [9], dialogue generation [10], text classification [11], word sense disambiguation [12], and so on, speaker adaptive training [13], speech-to-intent classification [14], code-switched speech recognition [15], and speech recognition [16].…”
Section: Introductionmentioning
confidence: 99%
“…Meta-learning has been applied in computer vision research and achieved great success in image classification [5,6,7,8]. Meanwhile, several natural language and speech processing tasks also adopt meta-learning and attain promising results, such as neural machine translation [9], dialogue generation [10], text classification [11], word sense disambiguation [12], and so on, speaker adaptive training [13], speech-to-intent classification [14], code-switched speech recognition [15], and speech recognition [16].…”
Section: Introductionmentioning
confidence: 99%
“…Rather than containing discrete ASR and NLU modules, E2E SLU models are trained to infer the utterance semantics directly from the spoken signal [13][14][15][16][17][18][19][20]. These models are trained to maximize the SLU prediction accuracy where the predicted semantic targets vary from just the intent [21,22], to a full interpretation with domain, intents, and slots [13].…”
Section: Introductionmentioning
confidence: 99%
“…A similar collection of French spoken NER and slot filling datasets has been investigated [26]. Over the last year the state-of-the-art on FSC has progressed to over 99% test set accuracy for several E2E approaches [14][15][16][17][18][19][20]. However, there remains a gap between the capabilities demonstrated thus far and the E2E SLU requirements for a generalized VA [27].…”
Section: Introductionmentioning
confidence: 99%
“…Recently there has been a significant effort to build end-to-end (E2E) models for spoken language understanding [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]. Instead of using an ASR system in tandem with a text-based natural language understanding system [2,16,17], these systems directly process speech to produce spoken language understanding (SLU) entity or intent label targets.…”
Section: Introductionmentioning
confidence: 99%