There is a good amount of similarity between source separation approaches that use spectrograms captured from multiple microphones and computer vision algorithms that use multiple images for segmentation problems. Just as one would use Markov random fields (MRF) to solve image segmentation problems, we propose a method of modeling source separation using MRFs, and then solving such problems via common MRF inference methods. To this end, as a preprocessing, we convert stereophonic spectrograms into a integrated form based on their inter-channel level differences (ILD), which is a procedure analogous to getting a disparity map from stereo images for matching problems. Given the ILD matrix as an observed image, we estimate latent labels which stand for the responsibility of each spectrogram's time/frequency bin to a specific sound source. It is shown that the proposed method shows reasonable separation performance in a variety of mixing environments including online separation and moving sources. We expect this new way of formulating source separation problems to help exploit advantages of probabilistic graphical models and the recent advances in low-power, high-performance hardware suited for such tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.