2009
DOI: 10.1109/tasl.2008.2006647
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Abstract: In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
197
1
1

Year Published

2010
2010
2018
2018

Publication Types

Select...
3
3
3

Relationship

2
7

Authors

Journals

citations
Cited by 269 publications
(200 citation statements)
references
References 48 publications
1
197
1
1
Order By: Relevance
“…On the other hand, for the SI case, the speakers in the training set do not overlap the speakers in the test set and the number of the utterances for each speaker of the test set is very small. The HMM model of each speaker in the test set is therefore adapted from the HMM of the SD case by the CMAPLR approach [51] given solely the test utterances. Figure 3 shows the schematic diagram of the HMM-based speech synthesizer used in this study.…”
Section: Speech Synthesismentioning
confidence: 99%
“…On the other hand, for the SI case, the speakers in the training set do not overlap the speakers in the test set and the number of the utterances for each speaker of the test set is very small. The HMM model of each speaker in the test set is therefore adapted from the HMM of the SD case by the CMAPLR approach [51] given solely the test utterances. Figure 3 shows the schematic diagram of the HMM-based speech synthesizer used in this study.…”
Section: Speech Synthesismentioning
confidence: 99%
“…It is mainly due to SPSS advantages over traditional concatenative speech synthesis approaches; these advantages include the flexibility to change voice characteristics [3][4][5], multilingual support [6][7][8], coverage of acoustic space [1], small footprint [1], and robustness [4,9]. All of the above advantages stem from the fact that SPSS provides a statistical model for acoustic features instead of using original speech waveforms.…”
Section: Introductionmentioning
confidence: 99%
“…Statistical parametric synthesis [1] using hidden Markov models (HMM) has proven to be a particularly flexible and robust framework for performing speaker transformation, leveraging off a range of speaker adaptation techniques previously developed for automatic speech recognition (ASR) [2]. Maximum likelihood linear transformation (MLLT) based adaptation techniques entail linear transformation of the means and variances of an HMM to match the characteristics of the speech for a given speaker.…”
Section: Introductionmentioning
confidence: 99%
“…al. [2] showed that due to the presence of hierarchial prior, constrained SMAP linear regression (CSMAPLR) is a more robust adaptation framework when compared to CMLLR in the context of statistical parametric speech synthesis.…”
Section: Introductionmentioning
confidence: 99%