2019
DOI: 10.1109/taslp.2018.2887337
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional Neural Networks to Enhance Coded Speech

Abstract: Enhancing coded speech suffering from far-end acoustic background noise, quantization noise, and potentially transmission errors, is a challenging task. In this work we propose two postprocessing approaches applying convolutional neural networks (CNNs) either in the time domain or the cepstral domain to enhance the coded speech without any modification of the codecs. The time domain approach follows an end-to-end fashion, while the cepstral domain approach uses analysis-synthesis with cepstral domain features.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 63 publications
(43 citation statements)
references
References 45 publications
0
42
0
Order By: Relevance
“…• The paper shows that a mask based post-filter in the spectral domain performs better than cepstral-domain post-filter (Cepstrum-CNN) as proposed in [12,13].…”
Section: Key Contribution Of This Papermentioning
confidence: 95%
“…• The paper shows that a mask based post-filter in the spectral domain performs better than cepstral-domain post-filter (Cepstrum-CNN) as proposed in [12,13].…”
Section: Key Contribution Of This Papermentioning
confidence: 95%
“…With the advent of deep learning, an increasing number of studies using deep neural networks (DNNs) for speech enhancement have shown that these models are able to significantly outperform classical and other machine learning-based methods in terms of speech quality and intelligibility [12][13][14][15][16][17][18][19][20][21]. This is especially true for non-stationary noise conditions, where deep learning-based methods have the advantage of making no assumptions on the stationarity of noise or the underlying distributions of speech and noise.…”
Section: Introductionmentioning
confidence: 99%
“…Park et al [27] demonstrate the effectivity of different variations of CEDs and Takahashi et al [20] introduce densely connected convolutional layers and multi-band processing into the architecture. A CED network has also been used by Zhao et al to enhance encoded and subsequently decoded speech in a postprocessing step, showing remarkable generalization capabilities even to unseen codecs [18].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations