Learning to Match Transient Sound Events Using Attentional Similarity for Few-shot Sound Recognition

Chou, Szu-Yu; Cheng, Kai-Hsiang; Jang, Jyh-Shing Roger; Yang, Yi-Hsuan

doi:10.1109/icassp.2019.8682558

Cited by 46 publications

(52 citation statements)

References 12 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the inference step, by using the feature space, the input is classified to one of the target classes. Regarding deep metric learning in acoustic signal processing [17][18][19][20][21][22][23][24][25][26], we summarize an overview of tasks, loss functions, and sampling strategies, in Table 1. Manocha et al have worked on sound clip search task and used contrastive loss, where a feature space is learned based on a pair type that consists of the same class or different classes and a feature space distance [19].…”

Section: Related Workmentioning

confidence: 99%

Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events

Shimada

Koyama

Inoue

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Few-shot learning systems for sound event recognition gain interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from long query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as a independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.

show abstract

Section: Related Workmentioning

confidence: 99%

Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events

Shimada

Koyama

Inoue

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The characteristics of this paradigm are highly compatible with COVID-19's disease detection tasks. Inspired by this, the experimental method we adopted when pre-training the model is consistent with [7]. Each iteration randomly selects c categories from all categories, each category contains k samples, and selects one category from each of the c categories as the test set.…”

Section: Cough Classification Algorithmmentioning

confidence: 99%

“…Based on the training strategy of few-shot learning, we introduce an attention similarity to complete the task of cough classification [7]. Unlike the previous method of calculating similarity by pooling to the same length, it can directly receive input features of different lengths and calculate the attention similarity between the input features and a certain type of features.…”

Section: Cough Classification Algorithmmentioning

confidence: 99%

Epidemic Guard: A COVID-19 Detection System for Elderly People

et al. 2020

View full text Add to dashboard Cite

The global outbreak of the COVID-19 in the worldwide has drawn lots of attention recently. The elderly are more vulnerable to COVID-19 and tend to have severe conditions and higher mortality as their immune function decreased and they are prone to having multiple chronic diseases. Therefore, avoiding viral infection, early detection and treatment of viral infection in the elderly are important measures to protect the safety of the elderly. In this paper, we propose a real-time robot-based COVID-19 detection system: Epidemic Guard. It combines speech recognition, keyword detection, cough classification, and medical services to convert real-time audio into structured data to record the user's real condition. These data can be further utilized by the rules engine to provide a basis for real-time supervision and medical services. In addition, Epidemic Guard comes with a powerful pre-training model to effectively customize the user's health status.

show abstract

“…As an alternative, few-shot learning [9][10][11][12][13][14] has been applied to audio classification [15][16][17] and sound event detection [18,19], where a classifier must learn to recognize a novel class from very few examples. Among different few-shot learning methods, metricbased prototypical networks [12] have been shown to yield excellent performance for audio [15,18,19].…”

Section: Introductionmentioning

confidence: 99%

Few-Shot Continual Learning for Audio Classification

Wang

Bryan

Cartwright

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Supervised learning for audio classification typically imposes a fixed class vocabulary, which can be limiting for real-world applications where the target class vocabulary is not known a priori or changes dynamically. In this work, we introduce a few-shot continual learning framework for audio classification, where we can continuously expand a trained base classifier to recognize novel classes based on only few labeled data at inference time. This enables fast and interactive model updates by end-users with minimal human effort. To do so, we leverage the dynamic few-shot learning technique and adapt it to a challenging multi-label audio classification scenario. We incorporate a recent state-of-the-art audio feature extraction model as a backbone and perform a comparative analysis of our approach on two popular audio datasets (ESC-50 and AudioSet). We conduct an in-depth evaluation to illustrate the complexities of the problem and show that, while there is still room for improvement, our method outperforms three baselines on novel class detection while maintaining its performance on base classes.

show abstract

Learning to Match Transient Sound Events Using Attentional Similarity for Few-shot Sound Recognition

Cited by 46 publications

References 12 publications

Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events

Metric Learning with Background Noise Class for Few-Shot Detection of Rare Sound Events

Epidemic Guard: A COVID-19 Detection System for Elderly People

Few-Shot Continual Learning for Audio Classification

Contact Info

Product

Resources

About