Open-Unmix - A Reference Implementation for Music Source Separation

Stöter, Fabian-Robert; Uhlich, Stefan; Liutkus, Antoine; Mitsufuji, Yuki

doi:10.21105/joss.01667

Cited by 200 publications

(184 citation statements)

References 22 publications

Supporting

Mentioning

164

Contrasting

Unclassified

Order By: Relevance

“…The paper proposes a new SID model extending from CRNN and involving the use of melody information by leveraging CREPE [6]. Also, a data augmentation method called shuffleand-remix is adopted to avoid the confounds from the accompaniments by using source separation [12]. Our evaluation shows that both melody information and data augmentation improve the result, especially the latter.…”

Section: Discussionmentioning

confidence: 96%

“…In contrast, in our work both the SS model and the SID model employ deep learning. Specifically, we use open-unmix [12], an open-source three-layer bidirectional deep recurrent neural network for SS. Moreover, we build upon our SID model based on the implementation of a convolutional recurrent neural network made available by Nasrullah and Zhao [17], which attains the highest song-level F1-score of 0.67 on the per-album split of the artist20 dataset [18], a standard dataset for SID.…”

Section: Conv Blockmentioning

confidence: 99%

“…For simplicity, we use the same design for the mel-spectrogram branch and the melody contour branch. Second, instead of using the mel-spectrogram of the mixture audio recordings, we employ open-unmix [12] to remove the instrumental part of the music, and use the proposed data augmentation technique to increase the size of the training data, as described below.…”

Section: Singer Identification (Sid) Modelsmentioning

confidence: 99%

“…Vocal-only: The vocal tracks separated by open-unmix [12]. In other words, all the accompaniments are removed.…”

Section: Data Augmentation: Separate Shuffle and Remixmentioning

confidence: 99%

See 3 more Smart Citations

Addressing The Confounds Of Accompaniments In Singer Identification

Hsieh

Cheng

Fan

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Identifying singers is an important task with many applications. However, the task remains challenging due to many issues. One major issue is related to the confounding factors from the background instrumental music that is mixed with the vocals in music production. A singer identification model may learn to extract non-vocal related features from the instrumental part of the songs, if a singer only sings in certain musical contexts (e.g., genres). The model cannot therefore generalize well when the singer sings in unseen contexts. In this paper, we attempt to address this issue. Specifically, we employ open-unmix, an open source tool with state-of-the-art performance in source separation, to separate the vocal and instrumental tracks of music. We then investigate two means to train a singer identification model: by learning from the separated vocal only, or from an augmented set of data where we "shuffle-and-remix" the separated vocal tracks and instrumental tracks of different songs to artificially make the singers sing in different contexts. We also incorporate melodic features learned from the vocal melody contour for better performance. Evaluation results on a benchmark dataset called the artist20 shows that this data augmentation method greatly improves the accuracy of singer identification.

show abstract

Section: Discussionmentioning

confidence: 96%

Section: Conv Blockmentioning

confidence: 99%

Section: Singer Identification (Sid) Modelsmentioning

confidence: 99%

“…Vocal-only: The vocal tracks separated by open-unmix [12]. In other words, all the accompaniments are removed.…”

Section: Data Augmentation: Separate Shuffle and Remixmentioning

confidence: 99%

See 2 more Smart Citations

Addressing The Confounds Of Accompaniments In Singer Identification

Hsieh

Cheng

Fan

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…A major design choice in music source separation models is whether to (1) train a separate model for each instrument [12], (2) to use a single class-conditional model, or (3) to use an instrument agnostic approach [16]. Our approach aims to combine the advantages of the first two; the high-precision of independent models, with improved optimization via parameter sharing in single models.…”

Section: Related Workmentioning

confidence: 99%

Meta-Learning Extractors for Music Source Separation

Samuel

Ganeshan

Naradowsky

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. This enables efficient parameter-sharing, while still allowing for instrument-specific parameterization. Meta-TasNet is shown to be more effective than the models trained independently or in a multi-task setting, and achieve performance comparable with state-of-the-art methods. In comparison to the latter, our extractors contain fewer parameters and have faster run-time performance. We discuss important architectural considerations, and explore the costs and benefits of this approach.

show abstract