Rachel M. Bittner scite author profile

Citation: Jansson, A., Bittner, R. M., Ewert, S. and Weyde, T. ORCID: 0000-0001- 8028-9905 (2019). Joint singing voice separation and F0 estimation with deep U-net architectures.Abstract-Vocal source separation and fundamental frequency estimation in music are tightly related tasks. The outputs of vocal source separation systems have previously been used as inputs to vocal fundamental frequency estimation systems; conversely, vocal fundamental frequency has been used as side information to improve vocal source separation. In this paper, we propose several different approaches for jointly separating vocals and estimating fundamental frequency. We show that joint learning is advantageous for these tasks, and that a stacked architecture which first performs vocal separation outperforms the other configurations considered. Furthermore, the best joint model achieves state-of-the-art results for vocal-f0 estimation on the iKala dataset. Finally, we highlight the importance of performing polyphonic, rather than monophonic vocal-f0 estimation for many real-world cases.

show abstract

Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research

McFee

Kim

Cartwright

et al. 2019

IEEE Signal Process. Mag.

View full text Add to dashboard Cite

Neural Music Synthesis for Flexible Timbre Control

Kim

Bittner²,

Kumar³

et al. 2019

View full text Add to dashboard Cite

The recent success of raw audio waveform synthesis models like WaveNet motivates a new approach for music synthesis, in which the entire process -creating audio samples from a score and instrument information -is modeled using generative neural networks. This paper describes a neural music synthesis model with flexible timbre controls, which consists of a recurrent neural network conditioned on a learned instrument embedding followed by a WaveNet vocoder. The learned embedding space successfully captures the diverse variations in timbres within a large dataset and enables timbre control and morphing by interpolating between instruments in the embedding space. The synthesis quality is evaluated both numerically and perceptually, and an interactive web demo is presented.

show abstract

Kernel Additive Modeling for interference reduction in multi-channel music recordings

Prätzlich

Bittner

Müller

2015

View full text Add to dashboard Cite

When recording a live musical performance, the different voices, such as the instrument groups or soloists of an orchestra, are typically recorded in the same room simultaneously, with at least one microphone assigned to each voice. However, it is difficult to acoustically shield the microphones. In practice, each one contains interference from every other voice. In this paper, we aim to reduce these interferences in multi-channel recordings to recover only the isolated voices. Following the recently proposed Kernel Additive Modeling framework, we present a method that iteratively estimates both the power spectral density of each voice and the corresponding strength in each microphone signal. With this information, we build an optimal Wiener filter, strongly reducing interferences. The trade-off between distortion and separation can be controlled by the user through the number of iterations of the algorithm. Furthermore, we present a computationally effective approximation of the iterative procedure. Listening tests demonstrate the effectiveness of the method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rachel M. Bittner

MUSDB18 - a corpus for music separation

Joint Singing Voice Separation and F0 Estimation with Deep U-Net Architectures

Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research

Neural Music Synthesis for Flexible Timbre Control

Kernel Additive Modeling for interference reduction in multi-channel music recordings

Contact Info

Product

Resources

About