Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification

Atal, Bishnu S.

doi:10.1121/1.1914702

Cited by 759 publications

(179 citation statements)

References 0 publications

Supporting

Mentioning

177

Contrasting

Unclassified

Order By: Relevance

“…1 500 500 i = l has been computed and the empirical covariance matrix p over the 500 examples has been calculated similarly to (2). Table I1 is the inverse Fourier transform of the product of l / S L ( w ) and Al?(T)e-Jrw.…”

Section: Resultsmentioning

confidence: 99%

“…. '-, a,} is the state set, g: S x 63 --t S is the state transition function, and h: S -+ Q. is the output function; explicitly, (s,, b,), a, = h(sJ (2) where s,, b,, and a, denote the state, input, and output processes.…”

Section: Review Of the Spectral Analysis Of The Output Of An Smmentioning

confidence: 99%

“…The general problem of synthesizing an assigned rational spectral density by means of an SM fed by a suitable input formed by independent and identically distributed (i.i.d.) symbols was dealt with by Mullis and Roberts [2], who proved that such a synthesis is possible for any rational spectral density. Unfortunately, the proof is not constructive and does not convey any suggestion about the choice of the SM and of the input probabilities.…”

Section: Introductionmentioning

confidence: 99%

“…types of feature vectors, the cepstrum provides the best performance in speech recognition [6] and speaker verification [2] applications.…”

Section: Introductonmentioning

confidence: 99%

See 3 more Smart Citations

On the asymptotic statistical behavior of empirical cepstral coefficients

Merhav

Lee

1993

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

Abstract-The asymptotic covariance matrix of the empirical cepstrum is analyzed. We show that for Gaussian processes, cepstral coefficients derived from smoothed periodograms are asymptotically uncorrelated and their variances multiplied by the sample size T tend to unity. For an autoregressive process and its autoregressive cepstrum estimate, somewhat weaker results hold. I. INTRODUCTONCepstral analysis is useful in the preprocessing of many speech recognition and speaker verification systems (see, e.g., [ 11- [6]). This is based on strong experimental evidence that among many Manuscript received September 4, 1991; revised June 15, 1992. The associate editor coordinating the review of this correspondence and approving it for publication was Prof. Georgios B. Giannakis.N. Merhav is with the Department of Electrical Engineering, TechnionIsraeli Institute of Technology, Haifa 32000, Israel.C.-H. Lee is with the Speech Research Department, AT&T Bell Laboratories, Murray Hill, NJ 07974.IEEE Log Number 9207542.types of feature vectors, the cepstrum provides the best performance in speech recognition [6] and speaker verification [2] applications. It is of interest, in light of this fact, to investigate the asymptotic statistical properties of the empirical cepstral vector. We examine both analytically (Section 11) and experimentally (Section 111) the covariance matrix of this vector when extracted from a stationary random process in two cases. First, an underlying stationary Gaussian process is assumed and we confine interest to the cepstrum derived from the smoothed periodogram [7]. The cepstral components are shown to be asymptotically uncorrelated and their variances, when multipled by sample size T, tend to unity as T + W . In the second case, an autoregressive (AR) process (not necessarily Gaussian) is assumed and we focus on the cepstrum derived from the empirical AR power spectrum density (PSD), which is a parametric estimator of the PSD. Here the covariance matrix, when multiplied by T, tends to the identity matrix in the weak norm sense (Hilbert-Schmidt), which is a weaker form of convergence than in the former case. Thus, in both cases the asymptotic covariance matrix is, in a sense, equivalent to the identity matrix independently of the underlying PSD.This "orthonormality " property of the cepstral vector regardless of the PSD, does not exist in many other feature vectors commonly used in speech processing, e.g., the AR parameter vector, the vector of reflection coefficients, and the DFT coefficients. It is interesting to note, however, that the log-spectral energies (which are related to the cepstrum via a Fourier transform) do have the above mentioned covariance orthonormality property under some conditions [lo]. This will be discussed more deeply in Section 11.One implication of these results is that, essentially, only the cepstral means carry useful information regarding the PSD, while the cepstral variances are relatively insensitive to the PSD. This observation has been also supported experimentally b...

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Review Of the Spectral Analysis Of The Output Of An Smmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…types of feature vectors, the cepstrum provides the best performance in speech recognition [6] and speaker verification [2] applications.…”

Section: Introductonmentioning

confidence: 99%

See 2 more Smart Citations

On the asymptotic statistical behavior of empirical cepstral coefficients

Merhav

Lee

1993

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

show abstract

“…Linear predictive coding (LPC) is used because of its simplicity and effectiveness in speaker/speech recognition [1,2]. Another widely used feature parameters, mel frequency cepstral coefficients (MFCC), are used [3] because they are calculated by using a filter-bank approach in which the set of filters has equal bandwidth with respect to the mel-scale frequencies.…”

Section: Introductionmentioning

confidence: 99%

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

Chen

Hsieh

Lai

2005

Natural Language Processing – IJCNLP 2004

View full text Add to dashboard Cite

This paper presents an effective and robust method for extracting features for speech processing. Based on the time-frequency multiresolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristics of the vocal track and vocal codes, the traditional linear predictive cepstral coefficients (LPCC) of the approximation channel, and the entropy of the detail channel for each decomposition process are calculated. In addition, a hard thresholding technique for each lower resolution is applied to remove interference from noise. Experimental results show that using this mechanism not only effectively reduces the influence of noise, but also improves recognition. Finally, the proposed feature extraction algorithm is evaluated on the MAT telephone speech database for text-independent speaker identification using the Gaussian Mixture Model (GMM) identifier. Some popular existing methods are also evaluated for comparison in this paper. The results show that the proposed method of feature extraction is more effective and robust than other methods. In addition, the performance of our method is very satisfactory even at low SNR.

show abstract

An HMM adaptation method for noise and distortion by maximizing likelihood

Minami

Furui

1998

Electron. Comm. Jpn. Pt. III

View full text Add to dashboard Cite

Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification

Cited by 759 publications

References 0 publications

On the asymptotic statistical behavior of empirical cepstral coefficients

On the asymptotic statistical behavior of empirical cepstral coefficients

Robust Speaker Identification System Based on Wavelet Transform and Gaussian Mixture Model

An HMM adaptation method for noise and distortion by maximizing likelihood

Contact Info

Product

Resources

About