“…Audio-Visual Source Separation Early methods for audio-visual source separation focus on mutual information [10], subspace analysis [42,34], matrix factorization [33,39], and correlated onsets [5,27]. Recent methods leverage deep learning for separating speech [8,31,3,11], musical instruments [52,13,51], and other objects [12].…”