An objective measure for predicting subjective quality of speech coders

Wang, S.; Sekey, A.; Gersho, A.

doi:10.1109/49.138987

Cited by 291 publications

(105 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The abilities of the PLP, BS and MFCC in providing speech representation with highly suppressed speaker-dependent information have been successfully demonstrated by their wide use in automatic speech recognition systems (ASR) [7,24,25]. In addition, an in-house investigation was undertaken to quantify these abilities, compare them to those of conventional speech analysis models such as the linear prediction (LP) technique [19], and choose orders of these models that best suit the proposed speech quality measure.…”

Section: Investigating Speaker Invariance Characteristics Of the Plpmentioning

confidence: 99%

“…In a similar fashion to the PLP model, Bark Spectrum (BS) analysis [7] aims to emulate several known features of perceptual processing of speech sounds by the human ear, specifically:…”

Section: Bark Spectrum Analysismentioning

confidence: 99%

“…Three speech analysis models that are based on short-term spectrum of speech and use concepts of the psychophysics of hearing, such as the critical-band spectral resolution, the equal-loudness curve and the intensity-loudness power law to derive an estimate of the auditory spectrum [19], have been selected to produce three versions of the proposed speech quality measure (See Section 3.1 for details). The first version of the measure (Version I) utilises a 5 th order Perceptual Linear Prediction (PLP) model [24], the second version (Version II) utilises a 17 th order Bark Spectrum (BS) analysis model [7], and the third version (Version III) utilises a 13 th order Mel-Frequency Cepstrum Coefficients (MFCC) [25]. This selection was also based on the abilities of these speech analysis models in suppressing speaker-dependent information, as investigated in Section 3.2.…”

Section: The Proposed Output-based Speech Quality Measurementioning

confidence: 99%

“…In the early 1990s, several new perceptual models for evaluating the quality of speech and audio coders emerged. For example, Wang et al [7] proposed an approach similar to that of Karjalainen, but without temporal masking, to compute loudness on a Sone scale in Bark bands and evaluate the mean squared Bark spectral distance (BSD). The perceptual approach was also explored for quality assessment of audio coders and systems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

New single-ended objective measure for non-intrusive speech quality evaluation

Mahdi

Picovici

2008

SIViP

View full text Add to dashboard Cite

This article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the Self-Organizing Map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a Perceptual Linear Prediction (PLP) model, the second utilises a Bark Spectrum (BS) analysis and the third utilises Mel-Frequency Cepstrum Coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.

show abstract

Section: Investigating Speaker Invariance Characteristics Of the Plpmentioning

confidence: 99%

“…In a similar fashion to the PLP model, Bark Spectrum (BS) analysis [7] aims to emulate several known features of perceptual processing of speech sounds by the human ear, specifically:…”

Section: Bark Spectrum Analysismentioning

confidence: 99%