Time-varying linear prediction for speech analysis and synthesis

Schnell, Karl; Lacroix, A.

doi:10.1109/icassp.2008.4518516

Cited by 16 publications

(8 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, it is noted that all the linear predictive analyses involved in the present study are classical in the sense that they use time-invariant filter coefficients that are updated once per frame. A more flexible paradigm is to utilize time-varying AR-modeling (e.g., Schnell and Lacroix, 2008;Rudoy et al, 2011) in which linear predictive filter coefficients evolve in time. Combining the proposed WLP-AME method with the time-varying AR modeling approach is another topic of future studies which would maybe help in detecting vocal tract variation in continuous high-pitched speech.…”

Section: Discussionmentioning

confidence: 99%

Formant frequency estimation of high-pitched vowels using weighted linear prediction

Alku

Pohjalainen

Vainio

et al. 2013

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.

show abstract

Section: Discussionmentioning

confidence: 99%

Formant frequency estimation of high-pitched vowels using weighted linear prediction

Alku

Pohjalainen

Vainio

et al. 2013

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

“…where the (n−p+1)th row of the matrix H x ∈ R (N −p)×p(q+1) is given by the Kronecker product ( Maximizing (8) with respect to α therefore yields the leastsquares solution of the following linear regression problem:…”

Section: Evaluation Of the Glrt Statisticmentioning

confidence: 99%

“…Earlier work in this direction began with the fitting of piecewise-constant AR models to test for nonstationarity [3], [4]. However, in reality, the vocal tract often varies slowly, rather than as a sequence of abrupt jumps; to this end, [5]- [8] studied time-varying linear prediction using TVAR models. In a more general setting, Kay [9] recently proposed a version of Based upon work supported in part by DARPA Grant HR0011-07-1-0007, DoD Air Force contract FA8721-10-C-0002, and an NSF Graduate Research Fellowship.…”

Section: Introductionmentioning

confidence: 99%

Time-Varying Autoregressions in Speech: Detection Theory and Applications

Rudoy

Quatieri

Wolfe

2011

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

This article develops a general detection theory for speech analysis based on time-varying autoregressive models, which themselves generalize the classical linear predictive speech analysis framework. This theory leads to a computationally efficient decision-theoretic procedure that may be applied to detect the presence of vocal tract variation in speech waveform data. A corresponding generalized likelihood ratio test is derived and studied both empirically for short data records, using formant-like synthetic examples, and asymptotically, leading to constant false alarm rate hypothesis tests for changes in vocal tract configuration. Two in-depth case studies then serve to illustrate the practical efficacy of this procedure across different time scales of speech dynamics: first, the detection of formant changes on the scale of tens of milliseconds of data, and second, the identification of glottal opening and closing instants on time scales below ten milliseconds.

show abstract

“…Even the advanced formant tracking algorithms which directly track formants from the cepstral coefficients use this piecewise approximation of the vocal tract system [8,9]. Time varying linear prediction (TVLP) tries to bridge this gap by modeling the speech signal over longer intervals of time by defining the vocal tract model parameters as a function of time [23][24][25].…”

Section: Introductionmentioning

confidence: 99%

Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and Tracking

Gowda¹,

Alku²

2016

Interspeech 2016

View full text Add to dashboard Cite

In this paper, we propose a new method for accurate detection, estimation and tracking of formants in speech signals using time-varying quasi-closed phase analysis (TVQCP). The proposed method combines two different methods of analysis namely, the time-varying linear prediction (TVLP) and quasiclosed phase (QCP) analysis. TVLP helps in better tracking of formant frequencies by imposing a time-continuity constraint on the linear prediction (LP) coefficients. QCP analysis, a type of weighted LP (WLP), improves the estimation accuracies of the formant frequencies by using a carefully designed weight function on the error signal that is minimized. The QCP weight function emphasizes the closed-phase region of the glottal cycle, and also weights down the regions around the main excitations. This results in reduced coupling of the subglottal cavity and the excitation source. Experimental results on natural speech signals show that the proposed method performs considerably better than the detect-and-track approach used in popular tools like Wavesurfer or Praat.

show abstract

Time-varying linear prediction for speech analysis and synthesis

Cited by 16 publications

References 5 publications

Formant frequency estimation of high-pitched vowels using weighted linear prediction

Formant frequency estimation of high-pitched vowels using weighted linear prediction

Time-Varying Autoregressions in Speech: Detection Theory and Applications

Time-Varying Quasi-Closed-Phase Weighted Linear Prediction Analysis of Speech for Accurate Formant Detection and Tracking

Contact Info

Product

Resources

About