Interspeech 2021 2021
DOI: 10.21437/interspeech.2021-1482
|View full text |Cite
|
Sign up to set email alerts
|

DCCRN+: Channel-Wise Subband DCCRN with SNR Estimation for Speech Enhancement

Abstract: Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(7 citation statements)
references
References 28 publications
0
7
0
Order By: Relevance
“…Further modification include an introduction of convolutional pathways, with filter kernel of 1 × 1, between the encoder and decoder through skip-connections, which is a similar idea to the DCCRN+ presented in [19]. However, instead of performing concatenation along the feature dimension, we propose to sum the output from the convolution path with the output of the preceding deconvolutional layer, before feeding it to the next deconvolutional layer.…”
Section: Further Modifications To the Network Architecture And Loss F...mentioning
confidence: 99%
“…Further modification include an introduction of convolutional pathways, with filter kernel of 1 × 1, between the encoder and decoder through skip-connections, which is a similar idea to the DCCRN+ presented in [19]. However, instead of performing concatenation along the feature dimension, we propose to sum the output from the convolution path with the output of the preceding deconvolutional layer, before feeding it to the next deconvolutional layer.…”
Section: Further Modifications To the Network Architecture And Loss F...mentioning
confidence: 99%
“…DNS-MOS is used to do model training and model selection during noise suppression development. DNSMOS is also used for doing ablation studies for noise suppressors [22,23]. DNS-MOS has been quite popular, with over a hundred researchers using it after several months of releasing it.…”
Section: Related Workmentioning
confidence: 99%
“…In the time-frequency processing domain, besides the amplitude features, the importance of the phase features, with which speech quality improves significantly when the phase features are accurately estimated, has been clarified [20]. Therefore, several studies developed phase-aware enhancement methods [21]- [26], of which the most successful is based on the concept of the complex ideal ratio mask (cIRM) [27]. With the development of complex-valued deep neural networks [28], estimating the cIRM from the complex-valued noisy time-frequency representation [23]- [26] becomes possible.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, several studies developed phase-aware enhancement methods [21]- [26], of which the most successful is based on the concept of the complex ideal ratio mask (cIRM) [27]. With the development of complex-valued deep neural networks [28], estimating the cIRM from the complex-valued noisy time-frequency representation [23]- [26] becomes possible. However, the unbounded property of the cIRM makes it difficult for optimization due to the infinite search space [29].…”
Section: Introductionmentioning
confidence: 99%