Douglas Turnbull scite author profile

Abstract-We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features and words. We collect a data set of 1700 human-generated annotations that describe 500 Western popular music tracks. For each word in a vocabulary, we use this data to train a Gaussian mixture model (GMM) over an audio feature space. We estimate the parameters of the model using the weighted mixture hierarchies expectation maximization algorithm. This algorithm is more scalable to large data sets and produces better density estimates than standard parameter estimation techniques. The quality of the music annotations produced by our system is comparable with the performance of humans on the same task. Our "query-by-text" system can retrieve appropriate songs for a large number of musically relevant words. We also show that our audition system is general by learning a model that can annotate and retrieve sound effects.Index Terms-Audio annotation and retrieval, music information retrieval, semantic music analysis.

show abstract

Playlist prediction via metric embedding

Chen

et al. 2012

View full text Add to dashboard Cite

Digital storage of personal music collections and cloud-based music services (e.g. Pandora, Spotify) have fundamentally changed how music is consumed. In particular, automatically generated playlists have become an important mode of accessing large music collections. The key goal of automated playlist generation is to provide the user with a coherent listening experience. In this paper, we present Latent Markov Embedding (LME), a machine learning algorithm for generating such playlists. In analogy to matrix factorization methods for collaborative filtering, the algorithm does not require songs to be described by features a priori, but it learns a representation from example playlists. We formulate this problem as a regularized maximum-likelihood embedding of Markov chains in Euclidian space, and show how the resulting optimization problem can be solved efficiently. An empirical evaluation shows that the LME is substantially more accurate than adaptations of smoothed n-gram models commonly used in natural language processing.

show abstract

Towards musical query-by-semantic-description using the CAL500 data set

Turnbull

Barrington

Torres

et al. 2007

View full text Add to dashboard Cite

Query-by-semantic-description (QBSD) is a natural paradigm for retrieving content from large databases of music. A major impediment to the development of good QBSD systems for music information retrieval has been the lack of a cleanlylabeled, publicly-available, heterogeneous data set of songs and associated annotations. We have collected the Computer Audition Lab 500-song (CAL500) data set by having humans listen to and annotate songs using a survey designed to capture 'semantic associations' between music and words. We adapt the supervised multi-class labeling (SML) model, which has shown good performance on the task of image retrieval, and use the CAL500 data to learn a model for music retrieval. The model parameters are estimated using the weighted mixture hierarchies expectation-maximization algorithm which has been specifically designed to handle realvalued semantic association between words and songs, rather than binary class labels. The output of the SML model, a vector of class-conditional probabilities, can be interpreted as a semantic multinomial distribution over a vocabulary. By also representing a semantic query as a query multinomial distribution, we can quickly rank order the songs in a database based on the Kullback-Leibler divergence between the query multinomial and each song's semantic multinomial. Qualitative and quantitative results demonstrate that our SML model can both annotate a novel song with meaningful words and retrieve relevant songs given a multi-word, text-based query.

show abstract

User-centered design of a social game to tag music

Barrington

O'Malley

Turnbull

et al. 2009

View full text Add to dashboard Cite

We present "Herd It", a competitive, online, multi-player game that has the implicit benefit of collecting tags for music. We describe Herd It's user-centered design process and demonstrate that the game can collect both musical and social data. This data can be used to build machine learning models that automatically associate music with tags. Herd It differs from previous "games with a purpose" in that it is designed to be social: the game runs on the Facebook online social network and scoring is based on consensus between a large group of listeners -"the Herd". By presenting music in a social context, Herd It adds demographic context to the semantic music descriptions that it collects. KeywordsAuditory I/O and sound in the UI, Distributed knowledge acquisition, Computer audition, Web-based games with a purpose

show abstract

Feature selection for content-based, time-varying musical emotion regression

Schmidt

Turnbull

Kim

2010

View full text Add to dashboard Cite

In developing automated systems to recognize the emotional content of music, we are faced with a problem spanning two disparate domains: the space of human emotions and the acoustic signal of music. To address this problem, we must develop models for both data collected from humans describing their perceptions of musical mood and quantitative features derived from the audio signal. In previous work, we have presented a collaborative game, MoodSwings, which records dynamic (per-second) mood ratings from multiple players within the two-dimensional Arousal-Valence representation of emotion. Using this data, we present a system linking models of acoustic features and human data to provide estimates of the emotional content of music according to the arousal-valence space. Furthermore, in keeping with the dynamic nature of musical mood we demonstrate the potential of this approach to track the emotional changes in a song over time. We investigate the utility of a range of acoustic features based on psychoacoustic and music-theoretic representations of the audio for this application. Finally, a simplified version of our system is re-incorporated into MoodSwings as a simulated partner for single-players, providing a potential platform for furthering perceptual studies and modeling of musical mood.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.