Abstract:In this paper, we introduce a novel attentional similarity module for the problem of few-shot sound recognition. Given a few examples of an unseen sound event, a classifier must be quickly adapted to recognize the new sound event without much fine-tuning. The proposed attentional similarity module can be plugged into any metric-based learning method for few-shot learning, allowing the resulting model to especially match related short sound events. Extensive experiments on two datasets show that the proposed mo… Show more
“…In the inference step, by using the feature space, the input is classified to one of the target classes. Regarding deep metric learning in acoustic signal processing [17][18][19][20][21][22][23][24][25][26], we summarize an overview of tasks, loss functions, and sampling strategies, in Table 1. Manocha et al have worked on sound clip search task and used contrastive loss, where a feature space is learned based on a pair type that consists of the same class or different classes and a feature space distance [19].…”
Few-shot learning systems for sound event recognition gain interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from long query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as a independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.
“…In the inference step, by using the feature space, the input is classified to one of the target classes. Regarding deep metric learning in acoustic signal processing [17][18][19][20][21][22][23][24][25][26], we summarize an overview of tasks, loss functions, and sampling strategies, in Table 1. Manocha et al have worked on sound clip search task and used contrastive loss, where a feature space is learned based on a pair type that consists of the same class or different classes and a feature space distance [19].…”
Few-shot learning systems for sound event recognition gain interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from long query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as a independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.
“…The characteristics of this paradigm are highly compatible with COVID-19's disease detection tasks. Inspired by this, the experimental method we adopted when pre-training the model is consistent with [7]. Each iteration randomly selects c categories from all categories, each category contains k samples, and selects one category from each of the c categories as the test set.…”
Section: Cough Classification Algorithmmentioning
confidence: 99%
“…Based on the training strategy of few-shot learning, we introduce an attention similarity to complete the task of cough classification [7]. Unlike the previous method of calculating similarity by pooling to the same length, it can directly receive input features of different lengths and calculate the attention similarity between the input features and a certain type of features.…”
The global outbreak of the COVID-19 in the worldwide has drawn lots of attention recently. The elderly are more vulnerable to COVID-19 and tend to have severe conditions and higher mortality as their immune function decreased and they are prone to having multiple chronic diseases. Therefore, avoiding viral infection, early detection and treatment of viral infection in the elderly are important measures to protect the safety of the elderly. In this paper, we propose a real-time robot-based COVID-19 detection system: Epidemic Guard. It combines speech recognition, keyword detection, cough classification, and medical services to convert real-time audio into structured data to record the user's real condition. These data can be further utilized by the rules engine to provide a basis for real-time supervision and medical services. In addition, Epidemic Guard comes with a powerful pre-training model to effectively customize the user's health status.
“…As an alternative, few-shot learning [9][10][11][12][13][14] has been applied to audio classification [15][16][17] and sound event detection [18,19], where a classifier must learn to recognize a novel class from very few examples. Among different few-shot learning methods, metricbased prototypical networks [12] have been shown to yield excellent performance for audio [15,18,19].…”
Supervised learning for audio classification typically imposes a fixed class vocabulary, which can be limiting for real-world applications where the target class vocabulary is not known a priori or changes dynamically. In this work, we introduce a few-shot continual learning framework for audio classification, where we can continuously expand a trained base classifier to recognize novel classes based on only few labeled data at inference time. This enables fast and interactive model updates by end-users with minimal human effort. To do so, we leverage the dynamic few-shot learning technique and adapt it to a challenging multi-label audio classification scenario. We incorporate a recent state-of-the-art audio feature extraction model as a backbone and perform a comparative analysis of our approach on two popular audio datasets (ESC-50 and AudioSet). We conduct an in-depth evaluation to illustrate the complexities of the problem and show that, while there is still room for improvement, our method outperforms three baselines on novel class detection while maintaining its performance on base classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.