A music search engine based on semantic text-based query

Buccoli, Michele; Zanoni, Massimiliano; Sarti, Augusto; Tubaro, Stefano

doi:10.1109/mmsp.2013.6659297

Cited by 8 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing researches on song search explore how to search target songs through various given information [20,4,12,5]. Wang et al [20] study how to use multigranularity tags to query songs.…”

Section: Related Workmentioning

confidence: 99%

“…Wang et al [20] study how to use multigranularity tags to query songs. Buccoli et al [4] explore how to search songs through a text description. Leu et al [12] and Chen et al [5] make use of the tune segment to search target songs.…”

Section: Related Workmentioning

confidence: 99%

“…For pre-trained chord embedding, we empirically limit the size of chord vocabulary to 500, and set the dimension of chord embedding to 64. We leverage the off-the-shelf script 4 to extract chord sequences from the LMD-full dataset [16], and train skip-gram model [14] on those sequences. To build the heterogeneous graph, we limit the size of vocabulary to 50k and only 12 most common chords are used in chord nodes.…”

Section: Implementation Detailmentioning

confidence: 99%

“…Then we fix the parameters of the graph attention layer in the chorus recognition task. In MMCR, we do grid search of learning rates [2e-4, 4e-4, 6e-4, 8e-4] and epochs [3,4,5,6] and find the model with learning rate 6e-4 and epochs 5 to work best. Besides, training uses the Adam optimizer with batch sizes of 128 and the default momentum.…”

Section: Implementation Detailmentioning

confidence: 99%

See 3 more Smart Citations

Multi-modal Chorus Recognition for Improving Song Search

Wang

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We discuss a novel task, Chorus Recognition, which could potentially benefit downstream tasks such as song search and music summarization. Different from the existing tasks such as music summarization or lyrics summarization relying on single-modal information, this paper models chorus recognition as a multi-modal one by utilizing both the lyrics and the tune information of songs. We propose a multi-modal Chorus Recognition model that considers diverse features. Besides, we also create and publish the first Chorus Recognition dataset containing 627 songs for public use. Our empirical study performed on the dataset demonstrates that our approach outperforms several baselines in chorus recognition. In addition, our approach also helps to improve the accuracy of its downstream task -song search by more than 10.6%.

show abstract

“…Existing researches on song search explore how to search target songs through various given information [20,4,12,5]. Wang et al [20] study how to use multigranularity tags to query songs.…”

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Implementation Detailmentioning

confidence: 99%

Section: Implementation Detailmentioning

confidence: 99%

See 2 more Smart Citations

Multi-modal Chorus Recognition for Improving Song Search

Wang

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The set of learned features that are used for the modeling of HLFs follows a classical training-based approach. Machine learning regressions allow us to adopt a dimensional representation for the semantic descriptors, which express the degree of intensity of each descriptor [8,15,20].…”

Section: Introductionmentioning

confidence: 99%

An unsupervised approach to the semantic description of the sound quality of violins

Buccoli

Zanoni

Setragno

et al. 2015

2015 23rd European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

In this study we propose a set of semantic musical descriptors that can be used for describing the timbre of violins. The proposed semantic model follows a dimensional approach, which allows us to express the degree of intensity of each descriptor. A set of recordings of a number of violins (among them, Stradivari, Amati and Guarnieri instruments) were annotated with the descriptors through questionnaires. The recordings are processed with deep learning techniques, to learn salient features from the audio signal in an unsupervised fashion. In this study we propose an automatic annotation procedure based on a set of regression functions that model each semantic descriptor using the learned set of features.

show abstract