The recently standardized 3GPP codec for Enhanced Voice Services (EVS) offers new features and improvements for low-delay real-time communication systems. Based on a novel, switched low-delay speech/audio codec, the EVS codec contains various tools for better compression efficiency and higher quality for clean/noisy speech, mixed content and music, including support for wideband, super-wideband and full-band content. The EVS codec operates in a broad range of bitrates, is highly robust against packet loss and provides an AMR-WB interoperable mode for compatibility with existing systems. This paper gives an overview of the underlying architecture as well as the novel technologies in the EVS codec and presents listening test results showing the performance of the new codec in terms of compression and speech/audio quality
A new codec for Enhanced Voice Services (EVS), the successor of the current mobile HD voice codec AMR-WB, was standardized by the 3rd Generation Partnership Project (3GPP) in September 2014. The EVS codec addresses 3GPP's needs for cutting-edge technology enabling operation of 3GPP mobile communication systems in the most competitive means in terms of communication quality and efficiency. This paper provides an in-depth insight into 3GPP's rigorous and transparent processes that made it possible for the mobile industry, with its many competing players, to successfully develop and standardize a codec in an open, fair and constructive process. This paper also enables an understanding of this achievement by providing an overview of the EVS codec technology, the standard specifications, and the performance of the codec that will elevate HD voice services to the next quality level
Based on two well-known auditory models, it is investigated whether the squared error between an original signal and a phase-distorted signal is a perceptually relevant measure for distortions in the Fourier phase spectrum of periodic signals obtained from speech. Both the performance of phase vector quantizers and the direct relationship between the squared error and two perceptual distortion measures are studied. The results indicate that for small values the squared error correlates well to the perceptual measures. However, for large errors, an increase in squared error does not, on average, lead to an increase in the perceptual measures. Empirical rate-perceptual distortion curves and listening tests confirm that, for low to medium codebook sizes, the average perceived distortion does not decrease with increasing codebook size when the squared error is used as encoding criterion.
In this article we introduce multi-variate block polar quantization (MBPQ). MBPQ minimizes a weighted distortion for a set of complex variables representing one block of a signal under a resolution constraint for the entire block. MBPQ allows for different probability distributions in different dimensions of the set of complex variables. It outperforms an earlier introduced block polar quantizer and unrestricted polar quantization (UPQ) both for Gaussian complex variables and for sinusoids found from audio data. In the case of audio data we found a performance gain of about 2.5 dB over the best performing conventional resolution-constrained polar quantization (UPQ).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.