Brian McFee scite author profile

This document describes version 0.4.0 of librosa: a Python package for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information retrieval. In this document, a brief overview of the library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

show abstract

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

McFee

Salamon

Bello

2018

IEEE/ACM Trans. Audio Speech Lang. Process.

144

130

View full text Add to dashboard Cite

Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor-and cost-intensive for human annotators to produce, which limits the practical scalability of SED methods.In this work, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality. The models, however, must still produce temporally dynamic predictions, which must be aggregated (pooled) when comparing against static labels during training. To facilitate this aggregation, we develop a family of adaptive pooling operators-referred to as auto-pool-which smoothly interpolate between common pooling operators, such as min-, max-, or average-pooling, and automatically adapt to the characteristics of the sound sources in question. We evaluate the proposed pooling operators on three datasets, and demonstrate that in each case, the proposed methods outperform non-adaptive pooling operators for static prediction, and nearly match the performance of models trained with strong, dynamic annotations. The proposed method is evaluated in conjunction with convolutional neural networks, but can be readily applied to any differentiable model for time-series label prediction. While this article focuses on SED applications, the proposed methods are general, and could be applied widely to MIL problems in any domain.

show abstract

The million song dataset challenge

et al. 2012

View full text Add to dashboard Cite

Learning Content Similarity for Music Recommendation

McFee

Barrington²,

Lanckriet

2012

IEEE Trans. Audio Speech Lang. Process.

103

View full text Add to dashboard Cite

Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such applications ultimately depend on the notion of similarity between items to produce high-quality results. Current state-of-the-art systems employ collaborative filter methods to represent musical items, effectively comparing items in terms of their constituent users. While collaborative filter techniques perform well when historical data is available for each item, their reliance on historical data impedes performance on novel or unpopular items. To combat this problem, practitioners rely on content-based similarity, which naturally extends to novel items, but is typically out-performed by collaborative filter methods.In this article, we propose a method for optimizing contentbased similarity by learning from a sample of collaborative filter data. The optimized content-based similarity metric can then be applied to answer queries on novel and unpopular items, while still maintaining high recommendation accuracy. The proposed system yields accurate and efficient representations of audio content, and experimental results show significant improvements in accuracy over competing content-based recommendation techniques.Index Terms-Audio retrieval and recommendation, music information retrieval, query-by-example, collaborative filters, structured prediction.

show abstract

Music Recommender Systems

Schedl

Knees

McFee

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Brian McFee

librosa: Audio and Music Signal Analysis in Python

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

The million song dataset challenge

Learning Content Similarity for Music Recommendation

Music Recommender Systems

Contact Info

Product

Resources

About