Key-Invariant Convolutional Neural Network Toward Efficient Cover Song Identification

Xu, Xiaoshuo; Chen, Xiaoou; Yang, Dingcheng

doi:10.1109/icme.2018.8486531

Cited by 26 publications

(37 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the current best setup, the total number of parameters is 6.3 M. We now motivate and present the key components of MOVE. Transposition-invariant architecture -Following the strategy proposed by Xu et al [19], we increase the dimension of the crema-PCP inputs X from 12×T to 23×T by concatenating two copies of X in the pitch dimension and removing the last pitch class. The first convolutional layer, with a kernel size of 12×180 traverses the input, going through all possible transpositions in the pitch dimension, and the subsequent max-pooling layer, with a kernel size of 12×1, keeps the transposition with the highest activation value (convolutions in MOVE have no padding).…”

Section: Network Architecturementioning

confidence: 99%

“…Results on Da-TACOS 2DFTM [17] 0.275 155 SiMPle [18] 0.332 142 Dmax [14] 0.322 132 Qmax [10] 0.365 113 Qmax* [30] 0.373 104 EarlyFusion [12] 0.426 116 LateFusion [14] 0.454 177 MOVE w/ d = 4 k (ours) 0.489 43 MOVE w/ d = 16 k (ours) 0.506 42 Results on YTC SiMPle [18] 0.591 8 2DFTM sequences [29] 0.648 8 InNet [19] 0.660 6 SuCo-DTW [31] 0.800 3 CQT-TPPNet [20] 0.859 3 MOVE w/ d = 16 k (ours) 0.885 3 Table 2. Comparison of state-of-the-art VI systems (best results are highlighted in bold).…”

Section: Map Mr1mentioning

confidence: 99%

See 1 more Smart Citation

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Yesiler

Serrà²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The version identification (VI) task deals with the automatic detection of recordings that correspond to the same underlying musical piece. Despite many efforts, VI is still an open problem, with much room for improvement, specially with regard to combining accuracy and scalability. In this paper, we present MOVE, a musically-motivated method for accurate and scalable version identification. MOVE achieves state-of-the-art performance on two publicly-available benchmark sets by learning scalable embeddings in an Euclidean distance space, using a triplet loss and a hard triplet mining strategy. It improves over previous work by employing an alternative input representation, and introducing a novel technique for temporal content summarization, a standardized latent space, and a data augmentation strategy specifically designed for VI. In addition to the main results, we perform an ablation study to highlight the importance of our design choices, and study the relation between embedding dimensionality and model performance.Index Terms-Cover song identification, deep learning, music embedding, network encoder.

show abstract

Section: Network Architecturementioning

confidence: 99%

Section: Map Mr1mentioning

confidence: 99%

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Yesiler

Serrà²

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…Second Hand Songs 100K (SHS100K), which is collected from Second Hand Songs website by [8], consisting of 8858 songs with various covers and 108523 recordings. This dataset is split into three subsets -SHS100K-TRAIN, SHS100K-VAL and SHS100K-TEST with a ratio of 8 : 1 : 1.…”

Section: Datasetmentioning

confidence: 99%

“…Each song in Youtube has 7 versions, with 2 original versions and 5 different versions and thus results in 350 recordings in total. In our experiment, we use the 100 original versions as references and the others as queries following the same as [15,9,8].…”

Section: Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Chen

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.

show abstract

Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification

Gu,

JingLi,

JiayiZhou

et al. 2023

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Key-Invariant Convolutional Neural Network Toward Efficient Cover Song Identification

Cited by 26 publications

References 6 publications

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Accurate and Scalable Version Identification Using Musically-Motivated Embeddings

Learning a Representation for Cover Song Identification Using Convolutional Neural Network

Acoustics-Text Dual-Modal Joint Representation Learning for Cover Song Identification

Contact Info

Product

Resources

About