2022
DOI: 10.1109/jstsp.2022.3190083
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Graphs for Audio Representation Learning With Limited Labeled Data

Abstract: How to cite:Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 49 publications
0
2
0
Order By: Relevance
“…The Spectrogram-VGG model is the same as the configuration A in [34] with only one change: the final layer is a softmax with 33 units. The feature for each audio input to [30] 0.39 ± 0.02 -87M SSL graph [31] 0.42 ± 0.02 -218K Wave-Logmel [32] 0.43 ± 0.04 -81M AST [33] 0.44 ± 0.00 -88M VAED [15] 0.50 ± 0.01 0.93 ± 0.00 2.1M…”
Section: Results and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…The Spectrogram-VGG model is the same as the configuration A in [34] with only one change: the final layer is a softmax with 33 units. The feature for each audio input to [30] 0.39 ± 0.02 -87M SSL graph [31] 0.42 ± 0.02 -218K Wave-Logmel [32] 0.43 ± 0.04 -81M AST [33] 0.44 ± 0.00 -88M VAED [15] 0.50 ± 0.01 0.93 ± 0.00 2.1M…”
Section: Results and Analysismentioning
confidence: 99%
“…The VATT [30] is a self-supervised multimodal transformer with a modality-agnostic, single-backbone Transformer and sharing weights between audio and video modality. We also compared our method with recent graph-based works [31,15]. The wave-Logmel [32] is a supervised CNN model which takes waveform and log mel spectrogram at the same time as input.…”
Section: Results and Analysismentioning
confidence: 99%