ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413580
|View full text |Cite
|
Sign up to set email alerts
|

Towards Efficient Models for Real-Time Deep Noise Suppression

Abstract: With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to obtain compact enough models, which are resource efficient during inference time. An important but often neglected aspect for data-driven methods is that results can be only convincing when tested on real-world data an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
52
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 70 publications
(52 citation statements)
references
References 22 publications
(31 reference statements)
0
52
0
Order By: Relevance
“…Training Data for SE: We utilized a large-scale and highquality simulated dataset described in [24], which includes around 1,000 hours of paired speech samples 1 . As a clean speech corpus, the dataset collects 544 hours of speech recordings with high mean opinion score (MOS) values from the LibriVox corpus [25].…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…Training Data for SE: We utilized a large-scale and highquality simulated dataset described in [24], which includes around 1,000 hours of paired speech samples 1 . As a clean speech corpus, the dataset collects 544 hours of speech recordings with high mean opinion score (MOS) values from the LibriVox corpus [25].…”
Section: Datasetsmentioning
confidence: 99%
“…In addition, the clean speech in each mixture is convolved with an acoustic room impulse response (RIR) sampled from 7,000 measured and simulated responses. See [24] for details of this dataset. The data are available publicly, except for the 65 hours of the internal noise recordings 2 .…”
Section: Datasetsmentioning
confidence: 99%
“…1. We use the convolutional recurrent network for speech enhancement (CRUSE) proposed in [19], which…”
Section: Speech Psd Estimationmentioning
confidence: 99%
“…4. As baselines we have the unprocessed reference microphone, delay&sum and superdirective MVDR beamformers, the single-channel DNN (CRUSE) [19] applied on the reference mic, mask-based MVDR beamformer using the DNN-mask to adaptively update the noise covariance [21], and the competitive RLS-WPD [15] as the state-of-the-art online convolutional beamformer.…”
Section: Evaluation Setupmentioning
confidence: 99%
See 1 more Smart Citation