2018 26th European Signal Processing Conference (EUSIPCO) 2018
DOI: 10.23919/eusipco.2018.8553571
|View full text |Cite
|
Sign up to set email alerts
|

Raw Multi-Channel Audio Source Separation using Multi- Resolution Convolutional Auto-Encoders

Abstract: Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multiresolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multiresolution features for separating the singing-voice from stereo music. Our experimen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
23
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(23 citation statements)
references
References 29 publications
(45 reference statements)
0
23
0
Order By: Relevance
“…The proposed MCGN is compared with seven baseline methods, including the standard DNN method from [11], the DNN method with skip connection S-DNN from [10], the LSTM model used in [16], the BLSTM model used in [15], the CNN based methods, the MRCAE method from [20], and the GRN method in [15]. The parameters of the CRN model are set by following [22].…”
Section: B Baselines and Parametersmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed MCGN is compared with seven baseline methods, including the standard DNN method from [11], the DNN method with skip connection S-DNN from [10], the LSTM model used in [16], the BLSTM model used in [15], the CNN based methods, the MRCAE method from [20], and the GRN method in [15]. The parameters of the CRN model are set by following [22].…”
Section: B Baselines and Parametersmentioning
confidence: 99%
“…Another promising direction has been on the exploitation of convolutional neural network (CNN), such as [19], where a convolutional encoder decoder (CED) is introduced to estimate the mapping relation between the noisy mixture and target speech. This is further improved for learning multi-resolution features, with a multi-resolution convolutional auto-encoders (MCARE) model [20], learning with dilated convolution to enlarge the receptive fields of the network in Wavenet, and learning with a gated mechanism to control the information flow among each layer [21]. Furthermore, the gated recurrent network (GRN) method is used with dilated 2-D convolutional layers to enlarge the receptive fields in the time-frequency (T-F) domain [15].…”
Section: Introductionmentioning
confidence: 99%
“…Recent literature shows that deep learning models operating on raw audio waveforms can achieve satisfactory results for several audio-based tasks [17,18,19]. And, among those, some are also recently starting to address the problem of music source separation directly in the waveform domain [20,21]. Stoller et al [20] proposed the Wave-U-Net (see Section 2.3 for more information), and Grais et al [21] proposed a multiresolution 3 CNN auto-encoder for singing-voice source separation.…”
Section: Introductionmentioning
confidence: 99%
“…And, among those, some are also recently starting to address the problem of music source separation directly in the waveform domain [20,21]. Stoller et al [20] proposed the Wave-U-Net (see Section 2.3 for more information), and Grais et al [21] proposed a multiresolution 3 CNN auto-encoder for singing-voice source separation. Unfortunately, though, these recent articles do not include any perceptual study comparing waveform-based models with spectrogram-based ones.…”
Section: Introductionmentioning
confidence: 99%
“…The mixture spectrogram is fed as the input to the network, and an estimated T-F mask for each source is considered as the output. Although those networks have been successful in music separation, they typically rely on the effectiveness of independent modeling for each source [16,18,19]. These independent networks are not reasonable for scaling the separation task to many sources.…”
Section: Introductionmentioning
confidence: 99%