Jordi Janer scite author profile

Abstract. In this paper we introduce a low-latency monaural source separation framework using a Convolutional Neural Network (CNN). We use a CNN to estimate time-frequency soft masks which are applied for source separation. We evaluate the performance of the neural network on a database comprising of musical mixtures of three instruments: voice, drums, bass as well as other instruments which vary from song to song. The proposed architecture is compared to a Multilayer Perceptron (MLP), achieving on-par results and a significant improvement in processing time. The algorithm was submitted to source separation evaluation campaigns to test efficiency, and achieved competitive results.

show abstract

Remixing music using source separation algorithms to improve the musical experience of cochlear implant users

Pons

Janer

Rode³

et al. 2016

View full text Add to dashboard Cite

Music perception remains rather poor for many Cochlear Implant (CI) users due to the users' deficient pitch perception. However, comprehensible vocals and simple music structures are well perceived by many CI users. In previous studies researchers re-mixed songs to make music more enjoyable for them, favoring the preferred music elements (vocals or beat) attenuating the others. However, mixing music requires the individually recorded tracks (multitracks) which are usually not accessible. To overcome this limitation, Source Separation (SS) techniques are proposed to estimate the multitracks. These estimated multitracks are further re-mixed to create more pleasant music for CI users. However, SS may introduce undesirable audible distortions and artifacts. Experiments conducted with CI users (N = 9) and normal hearing listeners (N = 9) show that CI users can have different mixing preferences than normal hearing listeners. Moreover, it is shown that CI users' mixing preferences are user dependent. It is also shown that SS methods can be successfully used to create preferred re-mixes although distortions and artifacts are present. Finally, CI users' preferences are used to propose a benchmark that defines the maximum acceptable levels of SS distortion and artifacts for two different mixes proposed by CI users.

show abstract

Score-Informed Source Separation for Multichannel Orchestral Recordings

Miron

Carabias-Orti

Bosch

et al. 2016

Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

This paper proposes a system for score-informed audio source separation for multichannel orchestral recordings. The orchestral music repertoire relies on the existence of scores. Thus, a reliable separation requires a good alignment of the score with the audio of the performance. To that extent, automatic score alignment methods are reliable when allowing a tolerance window around the actual onset and offset. Moreover, several factors increase the difficulty of our task: a high reverberant image, large ensembles having rich polyphony, and a large variety of instruments recorded within a distant-microphone setup. To solve these problems, we design context-specific methods such as the refinement of score-following output in order to obtain a more precise alignment. Moreover, we extend a close-microphone separation framework to deal with the distant-microphone orchestral recordings. Then, we propose the first open evaluation dataset in this musical context, including annotations of the notes played by multiple instruments from an orchestral ensemble. The evaluation aims at analyzing the interactions of important parts of the separation framework on the quality of separation. Results show that we are able to align the original score with the audio of the performance and separate the sources corresponding to the instrument sections.

show abstract

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Marxer

Janer

Bonada

2012

View full text Add to dashboard Cite

Abstract. This research focuses on the removal of the singing voice in polyphonic audio recordings under real-time constraints. It is based on time-frequency binary masks resulting from the combination of azimuth, phase difference and absolute frequency spectral bin classification and harmonic-derived masks. For the harmonic-derived masks, a pitch likelihood estimation technique based on Tikhonov regularization is proposed. A method for target instrument pitch tracking makes use of supervised timbre models. This approach runs in real-time on off-the-shelf computers with latency below 250ms. The method was compared to a state of the art Non-negative Matrix Factorization (NMF) offline technique and to the ideal binary mask separation. For the evaluation we used a dataset of multi-track versions of professional audio recordings.

show abstract

Ecological Acoustics Perspective for Content-Based Retrieval of Environmental Sounds

Roma¹,

Janer²,

Kersten³

et al. 2010

EURASIP Journal on Audio, Speech, and Music Processing

View full text Add to dashboard Cite

In this paper we present a method to search for environmental sounds in large unstructured databases of user-submitted audio, using a general sound events taxonomy from ecological acoustics. We discuss the use of Support Vector Machines to classify sound recordings according to the taxonomy and describe two use cases for the obtained classification models: a content-based web search interface for a large audio database and a method for segmenting field recordings to assist sound design.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jordi Janer

Monoaural Audio Source Separation Using Deep Convolutional Neural Networks

Remixing music using source separation algorithms to improve the musical experience of cochlear implant users

Score-Informed Source Separation for Multichannel Orchestral Recordings

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Ecological Acoustics Perspective for Content-Based Retrieval of Environmental Sounds

Contact Info

Product

Resources

About