Dmitry Bogdanov scite author profile

Abstract-Measuring music similarity is essential for multimedia retrieval. For music items, this task can be regarded as obtaining a suitable distance measurement between songs defined on a certain feature space. In this paper, we propose three of such distance measures based on the audio content. First, a low-level measure based on tempo-related description. Second, a high-level semantic measure based on the inference of different musical dimensions by support vector machines. These dimensions include genre, culture, moods, instruments, rhythm, and tempo annotations. Third, a hybrid measure which combines the above-mentioned distance measures with two existing lowlevel measures: a Euclidean distance based on principal component analysis of timbral, temporal, and tonal descriptors, and a timbral distance based on single Gaussian MFCC modeling. We evaluate our proposed measures against a number of baseline measures. We do this objectively based on a comprehensive set of music collections, and subjectively based on listeners' ratings. Results show that the proposed methods achieve accuracies comparable to the baseline approaches in the case of the tempo and classifier-based measures. The highest accuracies are obtained by the hybrid distance. Furthermore, the proposed classifier-based approach opens up the possibility to explore distance measures that are based on semantic notions.

show abstract

Essentia

Bogdanov

Wack

Gulati

et al. 2013

124

View full text Add to dashboard Cite

We present Essentia 2.0, an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. The library is cross-platform and currently supports Linux, Mac OS X, and Windows systems. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications.

show abstract

From Low-Level to High-Level: Comparative Study of Music Similarity Measures

Bogdanov

Serrà

Wack

et al. 2009

View full text Add to dashboard Cite

Studying the ways to recommend music to a user is a central task within the music information research community. From a content-based point of view, this task can be regarded as obtaining a suitable distance measurement between songs defined on a certain feature space. We propose two such distance measures. First, a low-level measure based on tempo-related aspects, and second, a highlevel semantic measure based on regression by support vector machines of different groups of musical dimensions such as genre and culture, moods and instruments, or rhythm and tempo. We evaluate these distance measures against a number of state-of-the-art measures objectively, based on 17 ground truth musical collections, and subjectively, based on 12 listeners' ratings. Results show that, in spite of being conceptually different, the proposed methods achieve comparable or even higher performance than the considered baseline approaches. Furthermore, they open up the possibility to explore distance metrics that are based on truly semantic notions.

show abstract

Enriched Music Representations With Multiple Cross-Modal Contrastive Learning

Ferraro

Favory

Drossos

et al. 2021

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Modeling various aspects that make a music piece unique is a challenging task, requiring the combination of multiple sources of information. Deep learning is commonly used to obtain representations using various sources of information, such as the audio, interactions between users and songs, or associated genre metadata. Recently, contrastive learning has led to representations that generalize better compared to traditional supervised methods. In this paper, we present a novel approach that combines multiple types of information related to music using cross-modal contrastive learning, allowing us to learn an audio feature from heterogeneous data simultaneously. We align the latent representations obtained from playlists-track interactions, genre metadata, and the tracks' audio, by maximizing the agreement between these modality representations using a contrastive loss. We evaluate our approach in three tasks, namely, genre classification, playlist continuation and automatic tagging. We compare the performances with a baseline audio-based CNN trained to predict these modalities. We also study the importance of including multiple sources of information when training our embedding model. The results suggest that the proposed method outperforms the baseline in all the three downstream tasks and achieves comparable performance to the state-of-the-art.

show abstract

Tensorflow Audio Models in Essentia

Alonso-Jiménez

Bogdanov

Pons

et al. 2020

View full text Add to dashboard Cite

Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a crosscollection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.

show abstract

Automatic playlist continuation using a hybrid recommender system combining features from text and audio

Ferraro

Bogdanov

Yoon

et al. 2018

View full text Add to dashboard Cite

The ACM RecSys Challenge 2018 focuses on music recommendation in the context of automatic playlist continuation. In this paper, we describe our approach to the problem and the final hybrid system that was submitted to the challenge by our team Cocoplaya. This system consists in combining the recommendations produced by two different models using ranking fusion. The first model is based on Matrix Factorization and it incorporates information from tracks' audio and playlist titles. The second model generates recommendations based on typical track co-occurrences considering their proximity in the playlists. The proposed approach is efficient and achieves a good overall performance, with our model ranked 4th on the creative track of the challenge leaderboard. Serra. 2018. Automatic playlist continuation using a hybrid recommender system combining features from text and audio. In

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dmitry Bogdanov

Music Recommender Systems

Semantic audio content-based music recommendation and visualization based on user preference examples

Unifying Low-Level and High-Level Music Similarity Measures

Essentia

From Low-Level to High-Level: Comparative Study of Music Similarity Measures

Enriched Music Representations With Multiple Cross-Modal Contrastive Learning

Tensorflow Audio Models in Essentia

Automatic playlist continuation using a hybrid recommender system combining features from text and audio

Contact Info

Product

Resources

About