Ranniery Maia scite author profile

The paper presents a mechanism to perform speaker adaptation in speech synthesis based on deep neural networks (DNNs). The mechanism extracts speaker identification vectors, socalled d-vectors, from the training speakers and uses them jointly with the linguistic features to train a multi-speaker DNNbased text-to-speech synthesizer (DNN-TTS). The d-vectors are derived by applying principal component analysis (PCA) on the bottleneck features of a speaker classifier network. At the adaptation stage, three variants are explored: (1) d-vectors calculated using data from the target speaker, or (2) d-vectors calculated as a weighted sum of d-vectors from training speakers, or (3) d-vectors calculated as an average of the above two approaches. The proposed method of unsupervised adaptation using the d-vector is compared with the commonly used i-vector based approach for speaker adaptation. Listening tests show that: (1) for speech quality, the d-vector based approach is significantly preferred over the i-vector based approach. All the d-vector variants perform similar for speech quality; (2) for speaker similarity, both d-vector and i-vector based adaptation were found to perform similar, except a small significant preference for the d-vector calculated as an average over the i-vector.

show abstract

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise

Valentini-Botinhao

Maia

Yamagishi

et al. 2012

View full text Add to dashboard Cite

In this paper we introduce a new cepstral coefficient extraction method based on an intelligibility measure for speech in noise, the Glimpse Proportion measure. This new method aims to increase the intelligibility of speech in noise by modifying the clean speech, and has applications in scenarios such as public announcement and car navigation systems. We first explain how the Glimpse Proportion measure operates and further show how we approximated it to integrate it into an existing spectral envelope parameter extraction method commonly used in the HMM-based speech synthesis framework. We then demonstrate how this new method changes the modelled spectrum according to the characteristics of the noise and show results for a listening test with vocoded and HMM-based synthetic speech. The test indicates that the proposed method can significantly improve intelligibility of synthetic speech in speech shaped noise.Index Terms-cepstral coefficient extraction, objective measure for speech intelligibility, Lombard speech, HMM-based speech synthesis

show abstract

A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing

Silva

Lima

Maia

et al. 2006

View full text Add to dashboard Cite

Complex cepstrum for statistical parametric speech synthesis

Maia

Akamine

Gales

2013

Speech Communication

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ranniery Maia

A trainable excitation model for HMM-based speech synthesis

Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors

Cepstral analysis based on the glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise

A rule-based grapheme-phone converter and stress determination for Brazilian Portuguese natural language processing

Complex cepstrum for statistical parametric speech synthesis

Contact Info

Product

Resources

About