Daniel P. W. Ellis scite author profile

We present CP-JKU submission to MediaEval 2019; a Receptive Field-(RF)-regularized and Frequency-Aware CNN approach for tagging music with emotion/mood labels. We perform an investigation regarding the impact of the RF of the CNNs on their performance on this dataset. We observe that ResNets with smaller receptive fields -originally adapted for acoustic scene classification -also perform well in the emotion tagging task. We improve the performance of such architectures using techniques such as Frequency Awareness and Shake-Shake regularization, which were used in previous work on general acoustic recognition tasks. 1 The source code is published at https

show abstract

Audio Set: An ontology and human-labeled dataset for audio events

Gemmeke

et al. 2017

View full text Add to dashboard Cite

librosa: Audio and Music Signal Analysis in Python

McFee

Raffel²,

Liang³

et al. 2015

2,074

933

View full text Add to dashboard Cite

This document describes version 0.4.0 of librosa: a Python package for audio and music signal processing. At a high level, librosa provides implementations of a variety of common functions used throughout the field of music information retrieval. In this document, a brief overview of the library's functionality is provided, along with explanations of the design goals, software development practices, and notational conventions.

show abstract

Model-Based Expectation-Maximization Source Separation and Localization

Mandel

Weiss

Ellis

2010

IEEE Trans. Audio Speech Lang. Process.

263

422

View full text Add to dashboard Cite

Abstract-This paper describes a system, referred to as MESSL, for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and PESQ results 0.27 mean opinion score units greater than four comparable algorithms.

show abstract

The ICSI Meeting Corpus

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daniel P. W. Ellis

CNN architectures for large-scale audio classification

Audio Set: An ontology and human-labeled dataset for audio events

librosa: Audio and Music Signal Analysis in Python

Model-Based Expectation-Maximization Source Separation and Localization

The ICSI Meeting Corpus

Contact Info

Product

Resources

About