2023
DOI: 10.1186/s13636-023-00297-4
|View full text |Cite
|
Sign up to set email alerts
|

Training audio transformers for cover song identification

Te Zeng,
Francis C. M. Lau

Abstract: In the past decades, convolutional neural networks (CNNs) have been commonly adopted in audio perception tasks, which aim to learn latent representations. However, for audio analysis, CNNs may exhibit limitations in effectively modeling temporal contextual information. Analogous to the successes of transformer architecture used in the fields of computer vision and audio classification, to capture long-range global contexts better, we here extend this line of work and propose an Audio Similarity Transformer (AS… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 31 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?