2016 24th European Signal Processing Conference (EUSIPCO) 2016
DOI: 10.1109/eusipco.2016.7760220
|View full text |Cite
|
Sign up to set email alerts
|

Two multimodal approaches for single microphone source separation

Abstract: In this paper, the problem of single microphone source separation via Nonnegative Matrix Factorization (NMF) by exploiting video information is addressed. Respective audio and video modalities coming from a single human speech usually have similar time changes. It means that changes in one of them usually corresponds to changes in the other one. So it is expected that activation coefficient matrices of their NMF decomposition are similar. Based on this similarity, in this paper the activation coefficient matri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(10 citation statements)
references
References 15 publications
0
10
0
Order By: Relevance
“…Audio-visual source separation The idea of guiding audio source separation using visual information can be traced back to [15,27], where mutual information is used to learn the joint distribution of the visual and auditory signals, then applied to isolate human speakers. Subsequent work explores audio-visual subspace analysis [62,67], NMF informed by visual motion [61,65], statistical convolutive mixture models [64], and correlating temporal onset events [8,52]. Recent work [62] attempts both localization and separation simultaneously; however, it assumes a moving object is present and only aims to decompose a video into background (assumed low-rank) and foreground sounds/pixels.…”
Section: Audio-visual Representation Learningmentioning
confidence: 99%
“…Audio-visual source separation The idea of guiding audio source separation using visual information can be traced back to [15,27], where mutual information is used to learn the joint distribution of the visual and auditory signals, then applied to isolate human speakers. Subsequent work explores audio-visual subspace analysis [62,67], NMF informed by visual motion [61,65], statistical convolutive mixture models [64], and correlating temporal onset events [8,52]. Recent work [62] attempts both localization and separation simultaneously; however, it assumes a moving object is present and only aims to decompose a video into background (assumed low-rank) and foreground sounds/pixels.…”
Section: Audio-visual Representation Learningmentioning
confidence: 99%
“…Audio-visual source separation is a task that utilizes visual information to guide sound source separation, in contrast to blind sound source separation that does not use visual information [20][21][22][23]. There are various methods employed for audio-visual source separation: NMF [24][25][26], subspace methods [27,28], the mix-and-separate method [8-10, 12, 29], and the use of facial information for speech separation [30][31][32][33]. Many works have attempted simultaneous sound source separation and sound source localization [8][9][10]12], as it is important to identify which objects are producing sound to perform sound source separation.…”
Section: Audio-visual Representation Learningmentioning
confidence: 99%
“…As done for videos, mutual information maximization has been used to perform source separation in a user-assisted fashion by identifying the source spatially. Recent methods perform this within the NMF-based source separation framework [124,137]. Several other approaches deal with both object segmentation and source separation together in a completely unsupervised manner.…”
Section: Av Object Localization and Extractionmentioning
confidence: 99%