Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-3208
|View full text |Cite
|
Sign up to set email alerts
|

Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition

Abstract: Spoken intent detection has become a popular approach to interface with various smart devices with ease. However, such systems are limited to the preset list of intents-terms or commands, which restricts the quick customization of personal devices to new intents. This paper presents a few-shot spoken intent classification approach with task-agnostic representations via meta-learning paradigm. Specifically, we leverage the popular representation based meta-learning learning to build a taskagnostic representatio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Are the two types of approaches above complementary? It has been found that the integration of PASE+ [15] and metalearning including Prototypical network and MetaOptNet [16] improves the keyword spotting performance [17]. But to our best knowledge, it is still unclear whether the effects of SSL and meta-learning are additive in general and independent of specific choices of SSL models or meta-learning algorithms.…”
Section: Train a Multi-class Keyword Classification Model On Librispe...mentioning
confidence: 99%
“…Are the two types of approaches above complementary? It has been found that the integration of PASE+ [15] and metalearning including Prototypical network and MetaOptNet [16] improves the keyword spotting performance [17]. But to our best knowledge, it is still unclear whether the effects of SSL and meta-learning are additive in general and independent of specific choices of SSL models or meta-learning algorithms.…”
Section: Train a Multi-class Keyword Classification Model On Librispe...mentioning
confidence: 99%
“…Meta-learning is achieved by solving a task-specific ridge regression problem that maps a deep representation to the target TS in closed-form, while the parameters of the representation are learned by backpropagation through the solver. Aside from the original application in few-shot image classification [7], differentiable closed-form solvers have been used for other few-shot problems like visual tracking [47], video object segmentation [24], spoken intent recognition [27] and spatial regression [17], while we are not aware of any application in forecasting.…”
Section: Related Workmentioning
confidence: 99%
“…A recent development of deep learning has revolutionized various audio-based applications such as emotion recognition (ER) [1], environmental sound classification (ESC) [2], and keyword spotting [3,4]. However, in a real-world setting where a deployed audio classification models may need to dynamically incorporate new tasks (i.e., new classes or inputs) from users [5] and changing input distribution [6], current supervised learning approaches are severely limited due to the constrained nature of available resources on the edge devices and the catastrophic forgetting (CF) issue [7].…”
Section: Introductionmentioning
confidence: 99%