ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053283
|View full text |Cite
|
Sign up to set email alerts
|

Enhancement of Coded Speech Using a Mask-Based Post-Filter

Abstract: The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven postfilter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoderdecoder (CED) network and a long short-term memory (LSTM) network are implemeted to estimate a real-valued mask per timefrequency bin. The proposed models were tested on the five l… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 13 publications
0
20
1
Order By: Relevance
“…The objective measure used in the experiments was the wideband PESQ scores [42], which are designed to mimic the ITU-T Recommendation P.800 Absolute Category Rating (ACR) Mean Opinion Score (MOS) test scores [48]. The average PESQ scores for the decoded signals, outputs of the mask-based post-filter in [17], outputs of the ffDNNbased post-processing without side information, and outputs of the proposed method utilizing side information for the HE-AAC operating at various bitrates are shown in Fig. 3.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…The objective measure used in the experiments was the wideband PESQ scores [42], which are designed to mimic the ITU-T Recommendation P.800 Absolute Category Rating (ACR) Mean Opinion Score (MOS) test scores [48]. The average PESQ scores for the decoded signals, outputs of the mask-based post-filter in [17], outputs of the ffDNNbased post-processing without side information, and outputs of the proposed method utilizing side information for the HE-AAC operating at various bitrates are shown in Fig. 3.…”
Section: Resultsmentioning
confidence: 99%
“…The neural network models for side information extraction and post-processing can be any deep learning model, such as feed-forward DNN (ffDNN), CNN, RNN, and adversarial loss-based models in the time-or frequency-domain [14]- [17]. In this study, we employed the simplest model to confirm if neural network-based side information generation and post-processing are effective.…”
Section: Enhancement Of Coded Speech Using Dnn-based Side Informationmentioning
confidence: 99%
See 3 more Smart Citations