ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9415009
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Contrastive Learning of Sound Event Representations

Abstract: Self-supervised representation learning can mitigate the limitations in recognition tasks with few manually labeled data but abundant unlabeled data-a common scenario in sound event research. In this work, we explore unsupervised contrastive learning as a way to learn sound event representations. To this end, we propose to use the pretext task of contrasting differently augmented views of sound events. The views are computed primarily via mixing of training examples with unrelated backgrounds, followed by othe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
46
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 34 publications
(59 citation statements)
references
References 20 publications
0
46
0
Order By: Relevance
“…We choose mixup because the concept of mixing sounds is an audio-informed operation, and it has been proven useful for SET [2,3,29] and other sound event research tasks [30]. In our view, mixup can be interpreted from two different perspectives.…”
Section: Mixupmentioning
confidence: 99%
“…We choose mixup because the concept of mixing sounds is an audio-informed operation, and it has been proven useful for SET [2,3,29] and other sound event research tasks [30]. In our view, mixup can be interpreted from two different perspectives.…”
Section: Mixupmentioning
confidence: 99%
“…This paradigm has seen major progress in computer vision [9,10,11] and in speech recognition [12,13,7]. For general-purpose audio, including a variety of environmental sounds beyond speech, the majority of works are based on contrastive learning [14,15,16,17,18,19], where a representation is learned by comparing pairs of examples selected by some semantically-correlated notion of similarity [20]. Specifically, comparisons are made between positive pairs of "similar" and negative pairs of "dissimilar" examples, with the goal of learning a representation that pulls together positive pairs and thus reflects semantic structure.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, promising results have been attained by contrastive learning approaches that solve the proxy task of similarity maximization [17,18,19], following the seminal SimCLR work in visual representation learning [9]. This method consists of maximizing the similarity between differently-augmented views of the same input audio example.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations