2019
DOI: 10.1007/s00521-019-04076-1
|View full text |Cite
|
Sign up to set email alerts
|

One deep music representation to rule them all? A comparative analysis of different representation learning strategies

Abstract: Inspired by the success of deploying deep learning in the fields of Computer Vision and Natural Language Processing, this learning paradigm has also found its way into the field of Music Information Retrieval. In order to benefit from deep learning in an effective, but also efficient manner, deep transfer learning has become a common approach. In this approach, it is possible to reuse the output of a pre-trained neural network as the basis for a new learning task. The underlying hypothesis is that if the initi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(29 citation statements)
references
References 55 publications
(76 reference statements)
0
26
0
Order By: Relevance
“…Few related works exist in the audio field -and every randomly weighted neural network we found in the audio literature was a mere baseline [2,7,24]. Inspired by previous computer vision works, we study which audio architectures work the best via evaluating how nontrained CNNs perform as feature extractors.…”
Section: Motivation -From Previous Workmentioning
confidence: 99%
“…Few related works exist in the audio field -and every randomly weighted neural network we found in the audio literature was a mere baseline [2,7,24]. Inspired by previous computer vision works, we study which audio architectures work the best via evaluating how nontrained CNNs perform as feature extractors.…”
Section: Motivation -From Previous Workmentioning
confidence: 99%
“…A similar approach is followed in Section 3. Recent attempts to learn audio embeddings and similarity functions with neural networks has shown promising results [7]. Video Analysis Video analytics software makes surveillance systems more efficient, by reducing the workload on security and management authorities.…”
Section: Related Workmentioning
confidence: 99%
“…Considering that lower layers of DC-NNs usually capture lower-level features such as edges from images or spectrograms, we hypothesized that sharing lower layers among the various DCNNs can be effective under the scenario where multiple learning sources are available. With this approach, one can expect that it not only ensures sufficient specialization on taskspecific upper layers, but also benefits from regularization effects on lower layers [14]. Joint learning of multiple tasks with shared layers can prevent the shared layer to overfit for a specific task, instead learning underlying factors that have commonalities required across tasks [6,19].…”
Section: Shared Architecturementioning
confidence: 99%
“…To overcome these potential problems, we therefore apply a label pre-processing step, obtaining Artist Group Factors (AGF) as learning targets, rather than individual artist identities. Finally, we train Deep Convolutional Neural Networks (DCNNs) employing different learning setups, ranging from targeting genre and various types of AGFs with individual networks, to employing a shared architecture as introduced in multiple previous Multi-Task Learning (MTL) works [2,3,6,14,16,18,24,25].…”
Section: Introductionmentioning
confidence: 99%