ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683710
|View full text |Cite
|
Sign up to set email alerts
|

Semi-supervised Acoustic Event Detection Based on Tri-training

Abstract: This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset. Recent acoustic event detectors are based on large-scale neural networks, which are typically trained with huge amounts of labeled data. Labels for acoustic events are expensive to obtain, and relevant acoustic event audios can be limited, especially for rare events. In this paper we leverage an Internet-scale unlabeled dataset with potential domain shift to improve the detection of acoustic events. Based on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 22 publications
(13 citation statements)
references
References 16 publications
(23 reference statements)
0
13
0
Order By: Relevance
“…A somewhat similar approach that also utilized pseudo labels is proposed by Shi et al [24]. Their idea was based on the concept of self-training, which leverages a trained model to make predictions on unlabeled data and uses resulting pseudo labels to update the model.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A somewhat similar approach that also utilized pseudo labels is proposed by Shi et al [24]. Their idea was based on the concept of self-training, which leverages a trained model to make predictions on unlabeled data and uses resulting pseudo labels to update the model.…”
Section: Related Workmentioning
confidence: 99%
“…Their idea was based on the concept of self-training, which leverages a trained model to make predictions on unlabeled data and uses resulting pseudo labels to update the model. However, as Shi et al [24] mentioned, such a training method can induce a large amount of noise. As such, they proposed to mitigate this problem by training multiple models and adding data according to the agreement of those models.…”
Section: Related Workmentioning
confidence: 99%
“…as is the case with other deep learning networks, these methods require a large amount of labeled data to optimize the system parameters. As curation of large labeled data is expensive and time consuming, alternative ideas have been explored particularly semi-supervised learning which leverages extensive unlabeled data in combination with small amounts of labeled data [8,6,9]. The Detection and Classification of Acoustic Scenes and Events (DCASE) task 4 challenge focuses on use of unlabeled data along with weakly labeled data which includes the identity of sound event classes without time boundary markings in order to train SED system [10].…”
Section: Introductionmentioning
confidence: 99%
“…Accurate labels and the markings of temporal boundaries are critical to train the model; however, generating them is often quite expensive and time consuming. Semi-supervised learning, which leverages extensive unlabeled data in combination with small amounts of labeled data, has been explored to resolve the issue in data collection [6], [8], [9]. In the recent challenge of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020, task 4 involves building an SED model in a semi-supervised fashion.…”
Section: Introductionmentioning
confidence: 99%