Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

Tokuda, Keiichi; Masuko, Takashi; Miyazaki, Noboru; Kobayashi, Takao

doi:10.1109/icassp.1999.758104

Cited by 203 publications

(95 citation statements)

References 7 publications

(4 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A five-state, left-to-right, no-skip structure was used for the HMMs. The excitation parameters were modeled with multi-space probability distributions HMMs [29] in both the proposed and conventional methods. Each state output probability distribution was modeled by a single Gaussian distribution with a diagonal covariance matrix.…”

Section: Methodsmentioning

confidence: 99%

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

Nakamura

Hashimoto

Nankaku

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-byframe feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given melcepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures. key words: integrative model, HMM-based speech synthesis, acoustic modeling, mel-cepstral analysis, trajectory HMM

show abstract

Section: Methodsmentioning

confidence: 99%

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

Nakamura

Hashimoto

Nankaku

et al. 2014

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

show abstract

“…The HMMs were composed of five states with no-skip left-to-right transitions, with one Gaussian mixture for each state. The MSDHMMs [38], were used for the F 0 modelling. The Mel Log Spectrum Approximation (MLSA) filter [39] was used for the synthesis from the generated speech parameters.…”

Section: Design Of a Cross-language Mapped Synthetic Voicementioning

confidence: 99%

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

Justin

Žibert

2016

Automatika

View full text Add to dashboard Cite

Original scientific paperNowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUIcapable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.Key words: Voice user interfaces, Human language technologies, HMM-based speech synthesis, Cross-language synthesis, Under-resourced languages, UBM-MAP-GMM phoneme mapping Primjena automatskog medujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze. U današnje vrijeme interakcijačovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uredaja i korisnika putem govora na vlastitom korisnikovom jeziku, cesto se raspravlja i analizira o jeftinom rješenju prijevoda govora na različite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akustična svojstva za ciljani jezik iz postojećih baza različitih jezika. Ovaj rad fokusiran je na povezivanje medujezičnih fonema izmedu oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagodena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu medujezičnih m...

show abstract

“…The cross valid prior distribution can be determined without tuning parameters. In the HMM-based speech synthesis, the multi-space probability distribution HMMs (MSD-HMMs) [10] have been used to model excitation. However, the cross valid prior distributions for the MSD-HMMs can be determined by using sufficient statistics of each space as equation (18).…”

Section: S S)mentioning

confidence: 99%

A Bayesian approach to HMM-based speech synthesis

Hashimoto

Zen

Nankaku

et al. 2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

This paper proposes a new framework of speech synthesis based on the Bayesian approach. The Bayesian method is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters. In the proposed framework, all processes for constructing the system can be derived from one single predictive distribution which represents the basic problem of speech synthesis directly. Using HMM as the likelihood function and assuming some approximations, it can be regarded as an application of the variational Bayesian method to the HMM-based speech synthesis. Experimental results show that the proposed method outperforms the conventional one in a subjective test.

show abstract

Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

Cited by 203 publications

References 7 publications

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

A Bayesian approach to HMM-based speech synthesis

Contact Info

Product

Resources

About