Rapid speaker adaptation in eigenvoice space

Kühn, Roland; Junqua, Jean‐Claude; Nguyen, Patrick; Niedzielski, Nancy

doi:10.1109/89.876308

Cited by 421 publications

(286 citation statements)

References 36 publications

Supporting

Mentioning

278

Contrasting

Unclassified

Order By: Relevance

“…Adaptation techniques such as MAP [24], MLLR [25], and eigenspacebased techniques [26] are often used to solve this problem. Although eigenspace-based techniques are effective when adaptation data are extremely small [38], they restrict the model to a lower dimensionality where much information might be lost [39]. On the other hand, MLLR and MAP do not impose this restriction on the models.…”

Section: Statistical Modeling Based On Gaussian Mixture Modelsmentioning

confidence: 99%

A statistical approach for person verification using human behavioral patterns

et al. 2013

View full text Add to dashboard Cite

We propose a person verification method using behavioral patterns of human upper body motion. Behavioral patterns are represented by three-dimensional features obtained from a time-of-flight camera. We take a statistical approach to model the behavioral patterns using Gaussian mixture models (GMM) and support vector machines. We employ the maximum likelihood linear regression adaptation method to estimate GMM parameters with a limited amount of data. Experimental results show that it reduced by 28.6% the relative equal error rates from a system using the maximum likelihood estimation with 25 samples per subject. We also demonstrate that the proposed approach is robust against variations in body motion over time.

show abstract

Section: Statistical Modeling Based On Gaussian Mixture Modelsmentioning

confidence: 99%

A statistical approach for person verification using human behavioral patterns

et al. 2013

View full text Add to dashboard Cite

show abstract

“…However, if we increase the number of representative HMM sets to enhance the capabilities of representation, it is difficult to determine the interpolation ratio to obtain the required voice. To address this problem, Shichiri et al applied the eigenvoice technique (Kuhn et al, 2000) to HMM-based speech synthesis (Shichiri et al, 2002). A speaker-specific "super-vector" was composed by concatenating the mean vectors of all state-output distributions in the model set for each S speaker-dependent HMM set.…”

Section: Eigenvoice (Producing Voices)mentioning

confidence: 99%

Statistical Parametric Speech Synthesis

Black

Zen

Tokuda

2007

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

209

View full text Add to dashboard Cite

This review gives a general overview of techniques used in statistical parametric speech synthesis. One instance of these techniques, called hidden Markov model (HMM)-based speech synthesis, has recently been demonstrated to be very effective in synthesizing acceptable speech. This review also contrasts these techniques with the more conventional technique of unit-selection synthesis that has dominated speech synthesis over the last decade. The advantages and drawbacks of statistical parametric synthesis are highlighted and we identify where we expect key developments to appear in the immediate future.

show abstract

“…This method achieves efficient adaptation. Adaptation techniques which only need a small amount of target speech data, such as those using inter-speaker variation modeling like Eigenvoice [5], have also been proposed. In this framework, the super vectors of the mean parameters of the speaker-dependent acoustic models are used as bases, and the super vector of the new speakerspecific acoustic models is expressed as a linear combination of these bases.…”

Section: Introductionmentioning

confidence: 99%

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Itoh

Hara

Kitaoka

et al. 2012

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYA novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

show abstract

Rapid speaker adaptation in eigenvoice space

Cited by 421 publications

References 36 publications

A statistical approach for person verification using human behavioral patterns

A statistical approach for person verification using human behavioral patterns

Statistical Parametric Speech Synthesis

Acoustic Model Training Using Pseudo-Speaker Features Generated by MLLR Transformations for Robust Speaker-Independent Speech Recognition

Contact Info

Product

Resources

About