2022
DOI: 10.48550/arxiv.2202.00097
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Abstract: Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labeled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel self-supervision tasks that can learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labele… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…We also compared our method with a graph-based work. Each node in this work represents an audio clip, and a KNN subgraph has been created, as well as a GNN that is trained using graph self-supervised proxy tasks [36]. We also use the two popular spatial and temporal network architectures, ResNet-1D [40] and LSTM, with pretrained embedding features for both audio and video as input, to further investigate the superiority of our graph modelling.…”
Section: Results and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…We also compared our method with a graph-based work. Each node in this work represents an audio clip, and a KNN subgraph has been created, as well as a GNN that is trained using graph self-supervised proxy tasks [36]. We also use the two popular spatial and temporal network architectures, ResNet-1D [40] and LSTM, with pretrained embedding features for both audio and video as input, to further investigate the superiority of our graph modelling.…”
Section: Results and Analysismentioning
confidence: 99%
“…Our network weights are initialized following the Xavier initialization. We used Adam optimizer with a learning rate of 0.005, a decay rate of 0.1 after 1500 it- [35] 0.39 ± 0.02 -87M SSL graph [36] 0.42 ± 0.02 -218K Wave-Logmel [37] 0.43 ± 0.04 -81M AST [38] 0.44 ± 0.00 -88M…”
Section: Implementation Detailsmentioning
confidence: 99%