2018
DOI: 10.1007/978-3-319-73031-8_7
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Network Based Multichannel Audio Source Separation

Abstract: HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des labora… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
6
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
2
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 61 publications
0
6
0
Order By: Relevance
“…As supervised speech source separation techniques, recently, deep neural network (DNN) based approaches with a training dataset This work was done while Yoshiki Masuyama and Yu Nakagome were interns at LINE Corporation. in which there are microphone input signal and corresponding oracle clean data have been widely studied, e.g., deep clustering (DC) [10,11], permutation invariant training (PIT) [12,13], deep attractor network [14,15], and hybrid approaches with BSS [16][17][18]. DNN based approaches can capture complicated spectral characteristics of a speech source.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…As supervised speech source separation techniques, recently, deep neural network (DNN) based approaches with a training dataset This work was done while Yoshiki Masuyama and Yu Nakagome were interns at LINE Corporation. in which there are microphone input signal and corresponding oracle clean data have been widely studied, e.g., deep clustering (DC) [10,11], permutation invariant training (PIT) [12,13], deep attractor network [14,15], and hybrid approaches with BSS [16][17][18]. DNN based approaches can capture complicated spectral characteristics of a speech source.…”
Section: Introductionmentioning
confidence: 99%
“…in which there are microphone input signal and corresponding oracle clean data have been widely studied, e.g., deep clustering (DC) [10,11], permutation invariant training (PIT) [12,13], deep attractor network [14,15], and hybrid approaches with BSS [16][17][18]. DNN based approaches can capture complicated spectral characteristics of a speech source.…”
Section: Introductionmentioning
confidence: 99%
“…Nonlinear models, such as deep neural networks (DNNs), are therefore highly applicable because of their ability to identify nonlinear structures in audio signals [11][12][13]. Additionally, recurrent neural networks (RNNs) that exhibit the temporal behaviour of a time sequence can be trained to predict time-frequency masks for target signals and separate sources from a mixed waveform [14].…”
Section: Introductionmentioning
confidence: 99%
“…First and foremost, Gaussian processes realizations may not explore more than a few standard deviations, which means that Bayesian inference in these models is intrinsically very sensitive to initialization, since the probability mass is almost everywhere negligible. A common workaround is to further constrain covariance models through shallow (Ozerov et al [31]) or deep (Nugraha et al [30]) parametric constraints, but another complementary route is to simply opt for heavy-tail models, for which much more robust inference is possible. For instance, multivariate Laplace filters (Wang et al [41]) were successfully used for robust detection and result from Bayesian inference in a state-space model where some variables are Laplace distributed.…”
Section: Introductionmentioning
confidence: 99%