2017
DOI: 10.1109/lsp.2017.2713830
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging

Abstract: Abstract-Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised featu… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
81
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 112 publications
(81 citation statements)
references
References 11 publications
0
81
0
Order By: Relevance
“…Systems using raw audio as input to DNN have been proposed in various domains, such as speech recognition [6], music classification, and audio tagging [7]. For example, raw audio has been exploited as features in speech recognition [6] and also for music auto-tagging [7].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Systems using raw audio as input to DNN have been proposed in various domains, such as speech recognition [6], music classification, and audio tagging [7]. For example, raw audio has been exploited as features in speech recognition [6] and also for music auto-tagging [7].…”
Section: Related Workmentioning
confidence: 99%
“…For example, raw audio has been exploited as features in speech recognition [6] and also for music auto-tagging [7]. More specifically, the concept of a 1D strided convolution layer that specially designs the first convolution layer for raw audio signals has been proposed [7].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Deep learning improves these results further, resulting in state-of-the-art performance. Convolutional recurrent neural networks, working on a low-level representation of sounds, have been used for learning features that would be useful in classification task [17,18]. While deep learning in itself performs very well, it creates new opportunities for the use of older machine learning methods.…”
Section: Introductionmentioning
confidence: 99%