“…Existing researches on song search explore how to search target songs through various given information [20,4,12,5]. Wang et al [20] study how to use multigranularity tags to query songs.…”
Section: Related Workmentioning
confidence: 99%
“…Wang et al [20] study how to use multigranularity tags to query songs. Buccoli et al [4] explore how to search songs through a text description. Leu et al [12] and Chen et al [5] make use of the tune segment to search target songs.…”
Section: Related Workmentioning
confidence: 99%
“…For pre-trained chord embedding, we empirically limit the size of chord vocabulary to 500, and set the dimension of chord embedding to 64. We leverage the off-the-shelf script 4 to extract chord sequences from the LMD-full dataset [16], and train skip-gram model [14] on those sequences. To build the heterogeneous graph, we limit the size of vocabulary to 50k and only 12 most common chords are used in chord nodes.…”
Section: Implementation Detailmentioning
confidence: 99%
“…Then we fix the parameters of the graph attention layer in the chorus recognition task. In MMCR, we do grid search of learning rates [2e-4, 4e-4, 6e-4, 8e-4] and epochs [3,4,5,6] and find the model with learning rate 6e-4 and epochs 5 to work best. Besides, training uses the Adam optimizer with batch sizes of 128 and the default momentum.…”
We discuss a novel task, Chorus Recognition, which could potentially benefit downstream tasks such as song search and music summarization. Different from the existing tasks such as music summarization or lyrics summarization relying on single-modal information, this paper models chorus recognition as a multi-modal one by utilizing both the lyrics and the tune information of songs. We propose a multi-modal Chorus Recognition model that considers diverse features. Besides, we also create and publish the first Chorus Recognition dataset containing 627 songs for public use. Our empirical study performed on the dataset demonstrates that our approach outperforms several baselines in chorus recognition. In addition, our approach also helps to improve the accuracy of its downstream task -song search by more than 10.6%.
“…Existing researches on song search explore how to search target songs through various given information [20,4,12,5]. Wang et al [20] study how to use multigranularity tags to query songs.…”
Section: Related Workmentioning
confidence: 99%
“…Wang et al [20] study how to use multigranularity tags to query songs. Buccoli et al [4] explore how to search songs through a text description. Leu et al [12] and Chen et al [5] make use of the tune segment to search target songs.…”
Section: Related Workmentioning
confidence: 99%
“…For pre-trained chord embedding, we empirically limit the size of chord vocabulary to 500, and set the dimension of chord embedding to 64. We leverage the off-the-shelf script 4 to extract chord sequences from the LMD-full dataset [16], and train skip-gram model [14] on those sequences. To build the heterogeneous graph, we limit the size of vocabulary to 50k and only 12 most common chords are used in chord nodes.…”
Section: Implementation Detailmentioning
confidence: 99%
“…Then we fix the parameters of the graph attention layer in the chorus recognition task. In MMCR, we do grid search of learning rates [2e-4, 4e-4, 6e-4, 8e-4] and epochs [3,4,5,6] and find the model with learning rate 6e-4 and epochs 5 to work best. Besides, training uses the Adam optimizer with batch sizes of 128 and the default momentum.…”
We discuss a novel task, Chorus Recognition, which could potentially benefit downstream tasks such as song search and music summarization. Different from the existing tasks such as music summarization or lyrics summarization relying on single-modal information, this paper models chorus recognition as a multi-modal one by utilizing both the lyrics and the tune information of songs. We propose a multi-modal Chorus Recognition model that considers diverse features. Besides, we also create and publish the first Chorus Recognition dataset containing 627 songs for public use. Our empirical study performed on the dataset demonstrates that our approach outperforms several baselines in chorus recognition. In addition, our approach also helps to improve the accuracy of its downstream task -song search by more than 10.6%.
“…The set of learned features that are used for the modeling of HLFs follows a classical training-based approach. Machine learning regressions allow us to adopt a dimensional representation for the semantic descriptors, which express the degree of intensity of each descriptor [8,15,20].…”
In this study we propose a set of semantic musical descriptors that can be used for describing the timbre of violins. The proposed semantic model follows a dimensional approach, which allows us to express the degree of intensity of each descriptor. A set of recordings of a number of violins (among them, Stradivari, Amati and Guarnieri instruments) were annotated with the descriptors through questionnaires. The recordings are processed with deep learning techniques, to learn salient features from the audio signal in an unsupervised fashion. In this study we propose an automatic annotation procedure based on a set of regression functions that model each semantic descriptor using the learned set of features.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.