2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 2017
DOI: 10.1109/globalsip.2017.8309164
|View full text |Cite
|
Sign up to set email alerts
|

Single channel audio source separation using convolutional denoising autoencoders

Abstract: Deep learning techniques have been used recently to tackle the audio source separation problem. In this work, we propose to use deep fully convolutional denoising autoencoders (CDAEs) for monaural audio source separation. We use as many CDAEs as the number of sources to be separated from the mixed signal. Each CDAE is trained to separate one source and treats the other sources as background noise. The main idea is to allow each CDAE to learn suitable spectral-temporal filters and features to its corresponding … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 87 publications
(50 citation statements)
references
References 32 publications
0
50
0
Order By: Relevance
“…A variety of networks have been successfully applied to twosource separation problems, including LSTMs and bidirectional LSTMs (BLSTMs) [2,4], U-Nets [15], Wasserstein GANs [16], and fully convolutional network (FCN) encoder-decoders followed by a BLSTM [17]. For multi-source separation, a variety of architectures have been used that directly generate a mask for each source, including BLSTMs [6,9], CNNs [18], DenseNets followed by an LSTM [19], separate encoder-decoder networks for each source [20], joint one-to-many encoder-decoder networks with o decoder per source [21], and TDCNs with learnable analysis- Figure 1: Architecture for mask-based separation experiments. We vary the mask network and analysis/synthesis transforms.…”
Section: Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A variety of networks have been successfully applied to twosource separation problems, including LSTMs and bidirectional LSTMs (BLSTMs) [2,4], U-Nets [15], Wasserstein GANs [16], and fully convolutional network (FCN) encoder-decoders followed by a BLSTM [17]. For multi-source separation, a variety of architectures have been used that directly generate a mask for each source, including BLSTMs [6,9], CNNs [18], DenseNets followed by an LSTM [19], separate encoder-decoder networks for each source [20], joint one-to-many encoder-decoder networks with o decoder per source [21], and TDCNs with learnable analysis- Figure 1: Architecture for mask-based separation experiments. We vary the mask network and analysis/synthesis transforms.…”
Section: Prior Workmentioning
confidence: 99%
“…Previous source separation work has focused on speech enhancement and speech separation [6,16,22,23]. Small datasets used for the non-speech multi-source separation setting have included distress sounds from DCASE 2017 [18], and speech and music in SiSEC-2015 [17,20]. Singing voice separation has focused on vocal and music instrument tracks [15,24].…”
Section: Prior Workmentioning
confidence: 99%
“…Neural network based regression methods have been used to solve music separation and speech separation in [6,7,8,10,11]. Regression based source separation methods learn a mapping from a mixture of sources to a target source to be separated.…”
Section: Regression Based Source Separationmentioning
confidence: 99%
“…On the other hand, convolutional layers, as used in convolutional neural networks (CNNs), make use of a set of small filters and share their weights among all locations in the data (e.g., LeCun et al 1998b), which allows to better capture the local features in the data. Therefore, CNNs generally have 2 or more orders of magnitude less parameters than the analogous fully connected neural networks (e.g., Grais & Plumbley 2017) and require much less training resources such as memory and time. Furthermore, multiple convolutional layers can be easily stacked to extract sophisticated higher level features by composing the lower-level ones obtained in previous layers.…”
Section: Convolutional Denoising Autoencodermentioning
confidence: 99%
“…Among various deep learning algorithms, the autoencoder is a common type of neural networks that aims at learning useful features from the input data in an unsupervised manner, and it is usually used for dimensionality reduction (e.g., Hinton & Salakhutdinov 2006;Wang et al 2014) and data denoising (e.g., Xie et al 2012;Bengio et al 2013;Lu et al 2013). In particular, the convolutional denoising autoencoder (CDAE) is very flexible and powerful in capturing subtle and complicated features in the data and have been successfully applied to weak gravitational wave signal denoising (e.g., Shen et al 2017), monaural audio source separation (e.g., Grais & Plumbley 2017), and so on. These applications have demonstrated the outstanding abilities of the CDAE in extracting weak signals from highly temporalvariable data.…”
Section: Introductionmentioning
confidence: 99%