Semi-supervised Acoustic Event Detection Based on Tri-training

Shi, Bowen; Sun, Ming; Kao, Chieh-Chi; Rozgić, Viktor; Matsoukas, Spyros; Wang, Chao

doi:10.1109/icassp.2019.8683710

Cited by 22 publications

(13 citation statements)

References 16 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A somewhat similar approach that also utilized pseudo labels is proposed by Shi et al [24]. Their idea was based on the concept of self-training, which leverages a trained model to make predictions on unlabeled data and uses resulting pseudo labels to update the model.…”

Section: Related Workmentioning

confidence: 99%

“…Their idea was based on the concept of self-training, which leverages a trained model to make predictions on unlabeled data and uses resulting pseudo labels to update the model. However, as Shi et al [24] mentioned, such a training method can induce a large amount of noise. As such, they proposed to mitigate this problem by training multiple models and adding data according to the agreement of those models.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Semi-Supervised NMF-CNN for Sound Event Detection

Chan

Chin

2021

IEEE Access

View full text Add to dashboard Cite

The lack of strongly labeled data can limit the potential of a Sound Event Detection (SED) system trained using deep learning approaches. To address this issue, this paper proposes a novel method to approximate strong labels for the weakly labeled data using Nonnegative Matrix Factorization (NMF) in a supervised manner. Using a combinative transfer learning and semi-supervised learning framework, two different Convolutional Neural Networks (CNN) are trained using synthetic data, approximated strongly labeled data, and unlabeled data where one model will produce the audio tags. In contrast, the other will produce the frame-level prediction. The proposed methodology is then evaluated on three different subsets of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 dataset: validation dataset, challenge evaluation dataset, and public YouTube evaluation dataset. Based on the results, our proposed methodology outperforms the baseline system by a minimum of 7% across these three different data subsets. In addition, our proposed method also outperforms the top 3 submissions from the DCASE 2019 challenge task 4 on the validation and public YouTube evaluation datasets. Our system performance is also competitive against the top submission in DCASE 2020 challenge task 4 on the challenge evaluation data. A post-challenge analysis was also performed using the validation dataset, which revealed the causes of the performance difference between our system and the top submission of the DCASE 2020 challenge task 4. The leading causes that we observed are 1) detection threshold tuning method and 2) augmentation techniques used. We observed that our system could perform better than the first place submission by 1.5% by changing our detection threshold tuning method. In addition, the post-challenge analysis also revealed that our system is more robust than the top submission in DCASE 2020 challenge task 4 on long-duration audio clips, where we outperformed them by 37%.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Semi-Supervised NMF-CNN for Sound Event Detection

Chan

Chin

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…as is the case with other deep learning networks, these methods require a large amount of labeled data to optimize the system parameters. As curation of large labeled data is expensive and time consuming, alternative ideas have been explored particularly semi-supervised learning which leverages extensive unlabeled data in combination with small amounts of labeled data [8,6,9]. The Detection and Classification of Acoustic Scenes and Events (DCASE) task 4 challenge focuses on use of unlabeled data along with weakly labeled data which includes the identity of sound event classes without time boundary markings in order to train SED system [10].…”

Section: Introductionmentioning

confidence: 99%

Self-Training for Sound Event Detection in Audio Mixtures

Park

Bellur

Han

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Sound event detection (SED) takes on the task of identifying presence of specific sound events in a complex audio recording. SED has tremendous implications in video analytics, smart speaker algorithms and audio tagging. Recent advances in deep learning have afforded remarkable advances in performance of SED systems; albeit at the cost of extensive labeling efforts to train supervised methods using fully described sound class labels and timestamps. In order to address limitations in availability of training data, this work proposes a self-training technique to leverage unlabeled datasets in supervised learning using pseudo label estimation. This approach proposes a dual-term objective function: a classification loss for the original labels and expectation loss for pseudo labels. The proposed self training technique is applied to sound event detection in the context of the DCASE 2020 challenge, and reports a notable improvement over the baseline system for this task. The self-training approach is particularly effective in extending the labeled database with concurrent sound events.

show abstract

“…Accurate labels and the markings of temporal boundaries are critical to train the model; however, generating them is often quite expensive and time consuming. Semi-supervised learning, which leverages extensive unlabeled data in combination with small amounts of labeled data, has been explored to resolve the issue in data collection [6], [8], [9]. In the recent challenge of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020, task 4 involves building an SED model in a semi-supervised fashion.…”

Section: Introductionmentioning

confidence: 99%

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

Park¹,

Han²,

Elhilali³

2021

Preprint

View full text Add to dashboard Cite

Sound event detection is an important facet of audio tagging that aims to identify sounds of interest and define both the sound category and time boundaries for each sound event in a continuous recording. With advances in deep neural networks, there has been tremendous improvement in the performance of sound event detection systems, although at the expense of costly data collection and labeling efforts. In fact, current state-of-theart methods employ supervised training methods that leverage large amounts of data samples and corresponding labels in order to facilitate identification of sound category and time stamps of events. As an alternative, the current study proposes a semi-supervised method for generating pseudo-labels from unsupervised data using a student-teacher scheme that balances self-training and cross-training. Additionally, this paper explores post-processing which extracts sound intervals from network prediction, for further improvement in sound event detection performance. The proposed approach is evaluated on sound event detection task for the DCASE2020 challenge. The results of these methods on both "validation" and "public evaluation" sets of DESED database show significant improvement compared to the state-of-the art systems in semi-supervised learning.

show abstract

Semi-supervised Acoustic Event Detection Based on Tri-training

Cited by 22 publications

References 16 publications

Semi-Supervised NMF-CNN for Sound Event Detection

Semi-Supervised NMF-CNN for Sound Event Detection

Self-Training for Sound Event Detection in Audio Mixtures

Cross-Referencing Self-Training Network for Sound Event Detection in Audio Mixtures

Contact Info

Product

Resources

About