Enhancement of Coded Speech Using a Mask-Based Post-Filter

Korse, Srikanth; Gupta, Krishnendu; Fuchs, Guillaume

doi:10.1109/icassp40776.2020.9053283

Cited by 10 publications

(21 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The objective measure used in the experiments was the wideband PESQ scores [42], which are designed to mimic the ITU-T Recommendation P.800 Absolute Category Rating (ACR) Mean Opinion Score (MOS) test scores [48]. The average PESQ scores for the decoded signals, outputs of the mask-based post-filter in [17], outputs of the ffDNNbased post-processing without side information, and outputs of the proposed method utilizing side information for the HE-AAC operating at various bitrates are shown in Fig. 3.…”

Section: Resultsmentioning

confidence: 99%

“…The neural network models for side information extraction and post-processing can be any deep learning model, such as feed-forward DNN (ffDNN), CNN, RNN, and adversarial loss-based models in the time-or frequency-domain [14]- [17]. In this study, we employed the simplest model to confirm if neural network-based side information generation and post-processing are effective.…”

Section: Enhancement Of Coded Speech Using Dnn-based Side Informationmentioning

confidence: 99%

“…It is noted that the current work focused on demonstrating the effectiveness of using neural networkbased side information for coded speech enhancement, and computational complexity was not of primary concern. We compared the performance of the proposed method to that of the decoded signals for the same codec operating at higher bitrates enhanced with ffDNN-based post-processing without any side information and the mask-based post-filter proposed in [17]. The ffDNN structure for post-processing without side information was also 1024-1024-1024 units for three hidden layers, which was the same as that for the post-processing g in the proposed method, except for the input dimension.…”

Section: B Compared Systems and Model Configurationsmentioning

confidence: 99%

“…The ffDNN structure for post-processing without side information was also 1024-1024-1024 units for three hidden layers, which was the same as that for the post-processing g in the proposed method, except for the input dimension. The mask-based post-filter [17] had a convolutional encoder-decoder structure that takes the log magnitude spectra for the current and previous frames to estimate the ideal ratio masks bounded by 4 and 2 for HE-AAC and AMR-WB, respectively. The ffDNN-based postprocessing without side information and the mask-based post-filter required approximately 197 WMOPS and 1187 WMOPS, respectively.…”

Section: B Compared Systems and Model Configurationsmentioning

confidence: 99%

“…Signal restoration using generative adversarial networks was proposed in [16], by assuming that the generative model is capable of recovering components lost by low bitrate coding. Mask-based post-filter was also investigated by employing a convolutional encoder-decoder [17]. In [18], [19], and [20], decoders of parametric coders were constructed using neural speech synthesizers based on WaveNet [21], SampleRNN [22], and LPCNet [23], respectively.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Enhancement of Coded Speech Using Neural Network-Based Side Information

et al. 2021

View full text Add to dashboard Cite

Audio codecs generate notable artifacts when operating at low bitrates, which degrade the quality of the coded audio significantly. There have been several approaches to enhance the quality of decoded signals with and without side information. While pre-or post-processing approaches without side information can be applied directly to existing systems without modifying codecs, approaches utilizing side information can further enhance the performance while maintaining backward-compatibility with existing codecs. In this paper, we propose a method to improve decoded signals using neural network-based side information. A neural network in the transmitter side that generates the side information and another neural network in the receiver side that estimates the log power spectra (LPS) of the original signal from the decoded signal and the side information are jointly trained to accurately reconstruct the original signal. In the same line with the analysis-by-synthesis, the neural network that generates the side information in the transmitter side takes not only the LPS of the original signal but also the LPS of the decoded signal as the input by decoding the encoded bitstream at the transmitter side. Experimental results show that the proposed audio codec enhancement scheme using neural network-based side information outperformed the audio codec enhancement without side information for the same codec operating at higher bitrates.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Enhancement Of Coded Speech Using Dnn-based Side Informationmentioning

confidence: 99%

Section: B Compared Systems and Model Configurationsmentioning

confidence: 99%

Section: B Compared Systems and Model Configurationsmentioning

confidence: 99%