Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition

Mittal, Anshul; Bharadwaj, Samarth; Khare, Shreya; Chemmengath, Saneem Ahmed; Sankaranarayanan, Karthik; Kingsbury, Brian

doi:10.21437/interspeech.2020-3208

Cited by 8 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Are the two types of approaches above complementary? It has been found that the integration of PASE+ [15] and metalearning including Prototypical network and MetaOptNet [16] improves the keyword spotting performance [17]. But to our best knowledge, it is still unclear whether the effects of SSL and meta-learning are additive in general and independent of specific choices of SSL models or meta-learning algorithms.…”

Section: Train a Multi-class Keyword Classification Model On Librispe...mentioning

confidence: 99%

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

Kao¹,

Wu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

User-defined keyword spotting is a task to detect new spoken terms defined by users. This can be viewed as a few-shot learning problem since it is unreasonable for users to define their desired keywords by providing many examples. To solve this problem, previous works try to incorporate self-supervised learning models or apply meta-learning algorithms. But it is unclear whether self-supervised learning and meta-learning are complementary and which combination of the two types of approaches is most effective for few-shot keyword discovery. In this work, we systematically study these questions by utilizing various self-supervised learning models and combining them with a wide variety of meta-learning algorithms. Our result shows that HuBERT combined with Matching network achieves the best result and is robust to the changes of few-shot examples.

show abstract

Section: Train a Multi-class Keyword Classification Model On Librispe...mentioning

confidence: 99%

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

Kao¹,

Wu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Meta-learning is achieved by solving a task-specific ridge regression problem that maps a deep representation to the target TS in closed-form, while the parameters of the representation are learned by backpropagation through the solver. Aside from the original application in few-shot image classification [7], differentiable closed-form solvers have been used for other few-shot problems like visual tracking [47], video object segmentation [24], spoken intent recognition [27] and spatial regression [17], while we are not aware of any application in forecasting.…”

Section: Related Workmentioning

confidence: 99%

Meta-Forecasting by combining Global Deep Representations with Local Adaptation

Grazzi¹,

Flunkert²,

Salinas³

et al. 2021

Preprint

View full text Add to dashboard Cite

While classical time series forecasting considers individual time series in isolation, recent advances based on deep learning showed that jointly learning from a large pool of related time series can boost the forecasting accuracy. However, the accuracy of these methods suffers greatly when modeling out-of-sample time series, significantly limiting their applicability compared to classical forecasting methods. To bridge this gap, we adopt a meta-learning view of the time series forecasting problem. We introduce a novel forecasting method, called Meta Global-Local Auto-Regression (Meta-GLAR), that adapts to each time series by learning in closed-form the mapping from the representations produced by a recurrent neural network (RNN) to one-step-ahead forecasts. Crucially, the parameters of the RNN are learned across multiple time series by backpropagating through the closed-form adaptation mechanism. In our extensive empirical evaluation we show that our method is competitive with the state-of-the-art in out-of-sample forecasting accuracy reported in earlier work.

show abstract

“…A recent development of deep learning has revolutionized various audio-based applications such as emotion recognition (ER) [1], environmental sound classification (ESC) [2], and keyword spotting [3,4]. However, in a real-world setting where a deployed audio classification models may need to dynamically incorporate new tasks (i.e., new classes or inputs) from users [5] and changing input distribution [6], current supervised learning approaches are severely limited due to the constrained nature of available resources on the edge devices and the catastrophic forgetting (CF) issue [7].…”

Section: Introductionmentioning

confidence: 99%

FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications

2021

View full text Add to dashboard Cite

Various incremental learning (IL) approaches have been proposed to help deep learning models learn new tasks/classes continuously without forgetting what was learned previously (i.e., avoid catastrophic forgetting). With the growing number of deployed audio sensing applications that need to dynamically incorporate new tasks and changing input distribution from users, the ability of IL on-device becomes essential for both efficiency and user privacy.However, prior works suffer from high computational costs and storage demands which hinders the deployment of IL ondevice. In this work, to overcome these limitations, we develop an end-to-end and on-device IL framework, FastICARL, that incorporates an exemplar-based IL and quantization in the context of audio-based applications. We first employ k-nearestneighbor to reduce the latency of IL. Then, we jointly utilize a quantization technique to decrease the storage requirements of IL. We implement FastICARL on two types of mobile devices and demonstrate that FastICARL remarkably decreases the IL time up to 78-92% and the storage requirements by 2-4 times without sacrificing its performance. FastICARL enables complete on-device IL, ensuring user privacy as the user data does not need to leave the device.

show abstract

Representation Based Meta-Learning for Few-Shot Spoken Intent Recognition

Cited by 8 publications

References 23 publications

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

Meta-Forecasting by combining Global Deep Representations with Local Adaptation

FastICARL: Fast Incremental Classifier and Representation Learning with Efficient Budget Allocation in Audio Sensing Applications

Contact Info

Product

Resources

About