2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7951787
|View full text |Cite
|
Sign up to set email alerts
|

Motion informed audio source separation

Abstract: In this paper we tackle the problem of single channel audio source separation driven by descriptors of the sounding object's motion. As opposed to previous approaches, motion is included as a softcoupling constraint within the nonnegative matrix factorization framework. The proposed method is applied to a multimodal dataset of instruments in string quartet performance recordings where bow motion information is used for separation of string instruments. We show that the approach offers better source separation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(28 citation statements)
references
References 21 publications
0
28
0
Order By: Relevance
“…Audio-Visual Source Separation Early methods for audio-visual source separation focus on mutual information [10], subspace analysis [42,34], matrix factorization [33,39], and correlated onsets [5,27]. Recent methods leverage deep learning for separating speech [8,31,3,11], musical instruments [52,13,51], and other objects [12].…”
Section: Related Workmentioning
confidence: 99%
“…Audio-Visual Source Separation Early methods for audio-visual source separation focus on mutual information [10], subspace analysis [42,34], matrix factorization [33,39], and correlated onsets [5,27]. Recent methods leverage deep learning for separating speech [8,31,3,11], musical instruments [52,13,51], and other objects [12].…”
Section: Related Workmentioning
confidence: 99%
“…Audio-visual source separation The idea of guiding audio source separation using visual information can be traced back to [15,27], where mutual information is used to learn the joint distribution of the visual and auditory signals, then applied to isolate human speakers. Subsequent work explores audio-visual subspace analysis [62,67], NMF informed by visual motion [61,65], statistical convolutive mixture models [64], and correlating temporal onset events [8,52]. Recent work [62] attempts both localization and separation simultaneously; however, it assumes a moving object is present and only aims to decompose a video into background (assumed low-rank) and foreground sounds/pixels.…”
Section: Audio-visual Representation Learningmentioning
confidence: 99%
“…Recent work [62] attempts both localization and separation simultaneously; however, it assumes a moving object is present and only aims to decompose a video into background (assumed low-rank) and foreground sounds/pixels. Prior methods nearly always tackle videos of people speaking or playing musical instruments [8,12,15,27,52,61,62,64]-domains where salient motion signals accompany audio events (e.g., a mouth or a violin bow starts moving, a guitar string suddenly accelerates). Some studies further assume side cues from a written musical score [52], require that each sound source has a period when it alone is active [12], or use ground-truth motion captured by MoCap [61].…”
Section: Audio-visual Representation Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…• Visually Informed Source Separation: Audio events (e.g., a violin note) are often associated with visual movements (e.g., a bowing motion) [5]. Designing methods that can leverage visual information for source separation is an interesting task.…”
Section: B New Tasks Using Both Audio and Visual Modalitiesmentioning
confidence: 99%