Gautham J. Mysore scite author profile

Supervised and semi-supervised source separation algorithms based on non-negative matrix factorization have been shown to be quite effective. However, they require isolated training examples of one or more sources, which is often difficult to obtain. This limits the practical applicability of these algorithms. We examine the problem of efficiently utilizing general training data in the absence of specific training examples. Specifically, we propose a method to learn a universal speech model from a general corpus of speech and show how to use this model to separate speech from other sound sources. This model is used in lieu of a speech model trained on speaker-dependent training examples, and thus circumvents the aforementioned problem. Our experimental results show that our method achieves nearly the same performance as when speaker-dependent training examples are used. Furthermore, we show that our method improves performance when training data of the non-speech source is available.

show abstract

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

Mysore

Smaragdis

Raj

2010

View full text Add to dashboard Cite

Separation by “humming”: User-guided sound extraction from monophonic mixtures

Smaragdis

Mysore²

2009

View full text Add to dashboard Cite

In this paper we present a novel approach for isolating and removing sounds from dense monophonic mixtures. The approach is user-based, and requires the presentation of a guide sound that mimics the desired target the user wishes to extract. The guide sound can be simply produced from a user by vocalizing or otherwise replicating the target sound marked for separation. Using that guide as a prior in a statistical sound mixtures model, we propose a methodology that allows us to ef ciently extract complex structured sounds from dense mixtures.

show abstract

Content-based tools for editing audio stories

Rubin

Berthouzoz

Mysore

et al. 2013

View full text Add to dashboard Cite

Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Existing audio editing tools force story producers to manipulate speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow producers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similarity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.

show abstract

Eulerian video magnification and analysis

Wadhwa¹,

Davis³

et al. 2016

Commun. ACM

View full text Add to dashboard Cite

The world is filled with important, but visually subtle signals. A person's pulse, the breathing of an infant, the sag and sway of a bridge-these all create visual patterns, which are too difficult to see with the naked eye. We present Eulerian Video Magnification, a computational technique for visualizing subtle color and motion variations in ordinary videos by making the variations larger. It is a microscope for small changes that are hard or impossible for us to see by ourselves. In addition, these small changes can be quantitatively analyzed and used to recover sounds from vibrations in distant objects, characterize material properties, and remotely measure a person's pulse.

show abstract

A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics

Mysore

Smaragdis

2011

View full text Add to dashboard Cite

International audienceWe present a semi-supervised source separation methodology to denoise speech by modeling speech as one source and noise as the other source. We model speech using the recently pro posed non-negative hidden Markov model, which uses multiple non-negative dictionaries and a Markov chain to jointly model spectral structure and temporal dynamics of speech. We perform separation of the speech and noise using the recently proposed non-negative factorial hidden Markov model. Although the speech model is learned from training data, the noise model is learned during the separation process and re quires no training data. We show that the proposed method achieves superior results to using non-negative spectrogram factorization, which ignores the non-stationarity and temporal dynamics of speech

show abstract

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

Jin

Finkelstein

Mysore

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gautham J. Mysore

Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view

Universal speech models for speaker independent single channel source separation

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

Separation by “humming”: User-guided sound extraction from monophonic mixtures

Content-based tools for editing audio stories

Eulerian video magnification and analysis

A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

Contact Info

Product

Resources

About

Gautham J. Mysore

Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view

Universal speech models for speaker independent single channel source separation

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

Separation by &#x201C;humming&#x201D;: User-guided sound extraction from monophonic mixtures

Content-based tools for editing audio stories

Eulerian video magnification and analysis

A non-negative approach to semi-supervised separation of speech from noise with the use of temporal dynamics

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder

Contact Info

Product

Resources

About

Separation by “humming”: User-guided sound extraction from monophonic mixtures