The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462529
|View full text |Cite
|
Sign up to set email alerts
|

Wavenet Based Low Rate Speech Coding

Abstract: Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
93
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
5

Relationship

2
8

Authors

Journals

citations
Cited by 119 publications
(95 citation statements)
references
References 31 publications
0
93
0
Order By: Relevance
“…It is known that none of the well-established objective quality tools were designed to evaluate signals synthesized by non-deterministic generative models. In fact, it was shown in [29] that the enhanced quality achieved with a generative decoder was not predicted by the objective tool. We still conducted this evaluation to understand the performance with an objective quality predictor.…”
Section: Objective Evaluationmentioning
confidence: 99%
“…It is known that none of the well-established objective quality tools were designed to evaluate signals synthesized by non-deterministic generative models. In fact, it was shown in [29] that the enhanced quality achieved with a generative decoder was not predicted by the objective tool. We still conducted this evaluation to understand the performance with an objective quality predictor.…”
Section: Objective Evaluationmentioning
confidence: 99%
“…LPC is useful in modern neural speech codecs, too. While generative autoregressive models, such as WaveNet, have greatly improved the synthesized speech quality [12], it comes at the cost of model complexity during the decoding process [13]. For example, vector quantized variational autoencoders (VQ-VAE) with WaveNet decoder achieves impressive speech quality at a very low bitrate of 1.6 kbps, yet with approximately 20 million trainable parameters [14].…”
Section: Introductionmentioning
confidence: 99%
“…Many DNN methods [11][12] take inputs in time-frequency (T-F) domain from short time Fourier transform (STFT) or modified discrete cosine transform (MDCT), etc. Recent DNN-based codecs [13][14] [15] [16] model speech signals in time domain directly without T-F transformation. They are referred to as endto-end methods, yielding competitive performance comparing with current speech coding standards, such as AMR-WB [7].…”
Section: Introductionmentioning
confidence: 99%