Prior information about the target source can improve audio source separation quality but is usually not available with the necessary level of audio alignment. This has limited its usability in the past. We propose a separation model that can nevertheless exploit such weak information for the separation task while aligning it on the mixture as a byproduct using an attention mechanism. We demonstrate the capabilities of the model on a singing voice separation task exploiting artificial side information with different levels of expressiveness. Moreover, we highlight an issue with the common separation quality assessment procedure regarding parts where targets or predictions are silent and refine a previous contribution for a more complete evaluation.
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.