ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9414177
|View full text |Cite
|
Sign up to set email alerts
|

Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

Abstract: This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0
3

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 125 publications
(57 citation statements)
references
References 22 publications
0
50
0
3
Order By: Relevance
“…In addition, the inter-channel magnitude/intensity difference plays an especially important role for binaural localization, as the intensity difference of binaural signals can reflect the torso/head shadow effect of signal propagation. In order to promote the localization performance, the recently proposed FullSubNet [16] is adopted to predict the complex ideal ratio mask and enhance the complex speech spectrograms. Accounting for the following DP-RTF learning, the clean direct-path sound is taken as the target signal, which means both noise reduction and dereverberation are conducted.…”
Section: Monaural Enhancementmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, the inter-channel magnitude/intensity difference plays an especially important role for binaural localization, as the intensity difference of binaural signals can reflect the torso/head shadow effect of signal propagation. In order to promote the localization performance, the recently proposed FullSubNet [16] is adopted to predict the complex ideal ratio mask and enhance the complex speech spectrograms. Accounting for the following DP-RTF learning, the clean direct-path sound is taken as the target signal, which means both noise reduction and dereverberation are conducted.…”
Section: Monaural Enhancementmentioning
confidence: 99%
“…The enhanced speech would be definitely helpful for DP-RTF estimation. In this work, we adopt the network architecture of the monaural speech enhancement method in [16]. This enhancement method is modified to recover the clean directpath magnitude and phase spectrograms from the contaminated ones, instead of recovering the noise-free signals.…”
Section: Introductionmentioning
confidence: 99%
“…As the authors did not use a studio with sound isolation, the dataset contains some environment noise. For our experiments we resample the audios to 16Khz and use the FullSubNet model [34] as denoiser. For development we randomly selected 500 samples and the rest of the dataset was used for training.…”
Section: Audio Datasetsmentioning
confidence: 99%
“…We utilized the small part in our experiments, specifically, 9h worth (2477 utterances) for training and the remaining 1h (286 utterances) for inference and evaluation. This division between the training and evaluation sets, i.e., into 9h and 1h sets, is given in [24] and its code 3 .…”
Section: Libri-light [24]mentioning
confidence: 99%
“…For instance, convolutional neural networks (CNNs) [1] have been shown to be better than using a short-time Fourier transform (STFT) and inverse STFT (ISTFT) for building an encoder and decoder [2]. Furthermore, methods that utilize recurrent neural networks (RNNs)-based models have been shown to be capable of real-time processing [3][4][5]. In addition, there are hybrid methods that exploit the benefits of both types of network, i.e., real-time processing and high performance [6,7].…”
Section: Introductionmentioning
confidence: 99%