ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414569
|View full text |Cite
|
Sign up to set email alerts
|

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Abstract: Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complexvalued convolutional layers by constructing more infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 20 publications
0
19
0
Order By: Relevance
“…The skip connections facilitate optimization by connecting each block in the encoder to its corresponding block in the decoder. We further add the attention block CCBAM [7] on the skip pathway to facilitate information flow. Note that we take all convolutions to be causal in time by applying asymmetrical paddings.…”
Section: The Overall Architecturementioning
confidence: 99%
See 2 more Smart Citations
“…The skip connections facilitate optimization by connecting each block in the encoder to its corresponding block in the decoder. We further add the attention block CCBAM [7] on the skip pathway to facilitate information flow. Note that we take all convolutions to be causal in time by applying asymmetrical paddings.…”
Section: The Overall Architecturementioning
confidence: 99%
“…It is a signal-level loss and directly performs on the signal itself. To further guide the learning of cIRM estimation, we also considered the mean squared error (MSE) losses of the real Mr and imaginary Mi estimates of cIRM [7]. Specifically, we optimize the FRCRN model by the following joint loss function:…”
Section: Joint Loss Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…The baseline systems used for performance comparison are Optimally Modified Log Spectral Amplitude (OMLSA) method [23], attention-wave-U-net [24], Attention-DCUNet [25], and TSTNN [26]. The setup of the baseline systems are detailed as below:…”
Section: Baselinesmentioning
confidence: 99%
“…Inspired by a recent work demonstrating that DNNs implementing complex operators [58] may outperform previous architectures in many audiorelated tasks, new state-of-the-art performances were achieved on speech enhancement using complex representations of audio data [14,15]. Recent work was able to further improve these approaches by introducing a complex convolutional block attention module (CCBAM) and a mixed loss function [59].…”
Section: Audio Enhancementmentioning
confidence: 99%