Takao Kobayashi scite author profile

This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.

show abstract

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

Yamagishi

Kobayashi

Nakano

et al. 2009

IEEE Trans. Audio Speech Lang. Process.

269

197

View full text Add to dashboard Cite

In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMM-based speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speaker-dependent models, gender-independent models, and the simultaneous use of the gender-dependent models to single use of the gender-dependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMM-based speech synthesis.

show abstract

An adaptive algorithm for mel-cepstral analysis of speech

et al. 1992

View full text Add to dashboard Cite

A B S T R A C T This paper describesa mel-cepstral analysis method and its adaptive algorithm. In the proposed method, we apply the criterion used in the unbiased estimation of log spectrum to the spectral model represented by the melcepstral coefficients. To solve the non-linear minimization problem involved in the method, we give an iterative algorithm whose convergence is guaranteed. Furthermore, we derive an adaptive algorithm for the mel-cepstral analysis by introducing an instantaneous estimate for gradient of the criterion. The adaptive mel-cepstral analysis system is implemented with an IIR adaptive filter which has an exponential transfer function, and whose stability is guaranteed. We also present examples of speech analysis and results of an isolated word recognition experiment. I N T R O D U C T I O NThe spectrum represented by the mel-cepstral coefficients have frequency resolution similar to that of the human ear which has high resolution at low frequencies [l]. As a result, mel-cepstral coefficients are useful for speech synthesis and recognition. For obtaining mel-cepstral coefficients, several methods have been proposed. For example, the mel-cepstral coefficients are obtained from the LPC coefficients by using the technique of spectral resampling. No strict method, however, is proposed in which the spectral model is represented by mel-cepstral coefficients and a spectral criterion is minimized.In this paper, we propose a mel-cepstral analysis method and its adaptive algorithm. In the mel-cepstral analysis method, the model spectrum is represented by the M-th order mel-cepstral coefficients and the criterion used in the unbiased estimation of log spectrum[2] is minimized with respect to the mel-cepstral coefficients. The minimization problem is solved efficiently by an iterative technique using the FFT, recursion formulas, and a fast algorithm that requires O ( M Z ) arithmetic operations. We can show that the convergence is quadratic and typically a few iterations are sufficient to obtain the solution.Furthermore, we present an adaptive algorithm for the mel-cepstral analysis. To derive the adaptive algorithm, we introduce an instantaneous estimate for the gradient of the criterion in a similar manner of the LMS algorithm [3].The adaptive analysis system is implemented with an IIR adaptive filter which has the structure of the MLSA filter We show examples of analysis for synthetic and speech signal. To evaluate the proposed methods, an isolated word recognition experiment was carried out. S P E C T R A L E S T I M A T I O N B A S E D O N M E L -C E P S T R A L R E P R E S E N T A T I O N

show abstract

Speech parameter generation from HMM using dynamic features

View full text Add to dashboard Cite

Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

Yamagishi

Kobayashi

2007

IEICE Transactions on Information and Systems

141

114

View full text Add to dashboard Cite

Expression of the endoplasmic reticulum molecular chaperone (ORP150) rescues hippocampal neurons from glutamate toxicity

Kitao

Ozawa²,

Miyazaki³

et al. 2001

J. Clin. Invest.

126

102

View full text Add to dashboard Cite

Clinical review of the japanese experience with boron neutron capture therapy and a proposed strategy using epithermal neutron beams

et al. 2003

View full text Add to dashboard Cite

Our concept of boron neutron capture therapy (BNCT) is selective destruction of tumor cells using the heavy-charged particles yielded through 10B(n, alpha)7 Li reactions. To design a new protocol that employs epithermal neutron beams in the treatment of glioma patients, we examined the relationship between the radiation dose, histological tumor grade, and clinical outcome. Since 1968, 183 patients with different kinds of brain tumors were treated by BNCT; for this retrospective study, we selected 105 patients with glial tumors who were treated in Japan between 1978 and 1997. In the analysis of side effects due to radiation, we included all the 159 patients treated between 1977 and 2001. With respect to the radiation dose (i.e. physical dose of boron n-alpha reaction), the new protocol prescribes a minimum tumor volume dose of 15 Gy or, alternatively, a minimum target volume dose of 18 Gy. The maximum vascular dose should not exceed 15 Gy (physical dose of boron n-alpha reaction) and the total amount of gamma rays should remain below 10 Gy, including core gamma rays from the reactor and capture gamma in brain tissue. The outcomes for 10 patients who were treated by the new protocol using a new mode composed of thermal and epithermal neutrons are reported.

show abstract

Radiobiological evidence suggesting heterogeneous microdistribution of boron compounds in tumors: Its relation to quiescent cell population and tumor cure in neutron capture therapy

Ono¹,

Masunaga²,

Kinashi³

et al. 1996

International Journal of Radiation Oncology*Biology*Physics

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Takao Kobayashi

Speech parameter generation algorithms for HMM-based speech synthesis

Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

An adaptive algorithm for mel-cepstral analysis of speech

Speech parameter generation from HMM using dynamic features

Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training

Expression of the endoplasmic reticulum molecular chaperone (ORP150) rescues hippocampal neurons from glutamate toxicity

Clinical review of the japanese experience with boron neutron capture therapy and a proposed strategy using epithermal neutron beams

Radiobiological evidence suggesting heterogeneous microdistribution of boron compounds in tumors: Its relation to quiescent cell population and tumor cure in neutron capture therapy

Contact Info

Product

Resources

About