2016
DOI: 10.1109/taslp.2016.2558822
|View full text |Cite
|
Sign up to set email alerts
|

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 83 publications
(35 citation statements)
references
References 52 publications
0
32
0
Order By: Relevance
“…This idea is depicted in Figure 1 where we learn a model to recover the filter bank (FBANK) features from the mixed FBANK features and then feed each stream of the recovered FBANK features to a conventional LVCSR system for recognition. In the simplest architecture, which is denoted as Arch#1 and illustrated in Figure 1(a), feature separation can be considered as a multi-class regression problem, similar to many previous works [29], [30], [31], [32], [33], [34]. In this architecture, Y, the feature of mixed speech, are used as the input to some deep learning models, such as deep neural networks (DNNs), convolutional neural networks (CNNs), and long short-term memory (LSTM) recurrent neural networks (RNNs), to estimate feature representation of each individual talker.…”
Section: A Feature Separation With Direct Supervisionmentioning
confidence: 99%
“…This idea is depicted in Figure 1 where we learn a model to recover the filter bank (FBANK) features from the mixed FBANK features and then feed each stream of the recovered FBANK features to a conventional LVCSR system for recognition. In the simplest architecture, which is denoted as Arch#1 and illustrated in Figure 1(a), feature separation can be considered as a multi-class regression problem, similar to many previous works [29], [30], [31], [32], [33], [34]. In this architecture, Y, the feature of mixed speech, are used as the input to some deep learning models, such as deep neural networks (DNNs), convolutional neural networks (CNNs), and long short-term memory (LSTM) recurrent neural networks (RNNs), to estimate feature representation of each individual talker.…”
Section: A Feature Separation With Direct Supervisionmentioning
confidence: 99%
“…Secondly, a sampling method needs to be defined for eqns. (9) and (11), that incorporates both clean and noisy inputs. To do this, a pair of corresponding clean and noisy spectra sequences, z, are sampled, from which G generates a fake samplex t .…”
Section: Discriminator Architecturementioning
confidence: 99%
“…Such situations require the ability to separate the voice of a particular speaker from the mixed audio signal of others. Several proposed systems have shown significant performance improvement on the separation task when prior information of speakers in a mixture is given [1], [2]. This however is still challenging when no prior information about the speakers is available, a problem known as speaker-independent speech separation.…”
Section: Introductionmentioning
confidence: 99%