Damien Lolive scite author profile

Damien Lolive

5Publications

38Citation Statements Received

90Citation Statements Given

How they've been cited

How they cite others

Affiliations

Institut de Recherche en Informatique et Systèmes Aléatoires, University of Rennes

Publications

Order By: Most citations

Improving TTS with Corpus-Specific Pronunciation Adaptation

Tahon¹,

Qader²,

Lecorvé³

et al. 2016

View full text Add to dashboard Cite

Text-to-speech (TTS) systems are built on speech corpora which are labeled with carefully checked and segmented phonemes. However, phoneme sequences generated by automatic grapheme-to-phoneme converters during synthesis are usually inconsistent with those from the corpus, thus leading to poor quality synthetic speech signals. To solve this problem, the present work aims at adapting automatically generated pronunciations to the corpus. The main idea is to train corpusspecific phoneme-to-phoneme conditional random fields with a large set of linguistic, phonological, articulatory and acousticprosodic features. Features are first selected in cross-validation condition, then combined to produce the final best feature set. Pronunciation models are evaluated in terms of phoneme error rate and through perceptual tests. Experiments carried out on a French speech corpus show an improvement in the quality of speech synthesis when pronunciation models are included in the phonetization process. Appart from improving TTS quality, the presented pronunciation adaptation method also brings interesting perspectives in terms of expressive speech synthesis.Experiments were carried out on a French speech corpus dedicated to interactive vocal system TTS. As such, this corpus covers all diphonemes present in French and comprises most

show abstract

Can We Generate Emotional Pronunciations for Expressive Speech Synthesis?

Tahon

Lecorvé

Lolive

2020

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

Abstract-In the field of expressive speech synthesis, a lot of work has been conducted on suprasegmental prosodic features while few has been done on pronunciation variants. However, prosody is highly related to the sequence of phonemes to be expressed. This article raises two issues in the generation of emotional pronunciations for TTS systems. The first issue consists in designing an automatic pronunciation generation method from text, while the second issue addresses the very existence of emotional pronunciations through experiments conducted on emotional speech. To do so, an innovative pronunciation adaptation method which automatically adapts canonical phonemes first to those labeled in the corpus used to create a synthetic voice, then to those labeled in an expressive corpus, is presented. This method consists in training conditional random fields pronunciation models with prosodic, linguistic, phonological and articulatory features. The analysis of emotional pronunciations reveals strong dependencies between prosody and phoneme assimilation or elisions. According to perceptual tests, the double adaptation allows to synthesize expressive speech samples of good quality, but emotion-specific pronunciations are too subtle to be perceived by testers.

show abstract

Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis

Shamsi¹,

Lolive²,

Barbot³

et al. 2019

View full text Add to dashboard Cite

In this study, we propose an approach for script selection in order to design TTS speech corpora. A Deep Convolutional Neural Network (DCNN) is used to project linguistic information to an embedding space. The embedded representation of the corpus is then fed to a selection process to extract a subset of utterances which offers a good linguistic coverage while tending to limit the linguistic unit repetition. We present two selection processes: a clustering approach based on utterance distance and another method that tends to reach a target distribution of linguistic events. We compare the synthetic signal quality of the proposed methods to state of art methods objectively and subjectively. The subjective and objective measures confirm the performance of the proposed methods in order to design speech corpora with better synthetic speech quality. The perceptual test shows that our TTS global cost can be used as an alternative to synthetic overall quality.

show abstract

Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features

Qader

Lecorvé

Lolive

et al. 2015

View full text Add to dashboard Cite

Abstract. Pronunciation adaptation consists in predicting pronunciation variants of words and utterances based on their standard pronunciation and a target style. This is a key issue in text-to-speech as those variants bring expressiveness to synthetic speech, especially when considering a spontaneous style. This paper presents a new pronunciation adaptation method which adapts standard pronunciations to the style of individual speakers in a context of spontaneous speech. Its originality and strength are to solely rely on linguistic features and to consider a probabilistic machine learning framework, namely conditional random fields, to produce the adapted pronunciations. Features are first selected in a series of experiments, then combined to produce the final adaptation method. Backend experiments on the Buckeye conversational English speech corpus show that adapted pronunciations significantly better reflect spontaneous speech than standard ones, and that even better could be achieved if considering alternative predictions.

show abstract

Do not build your TTS training corpus randomly

Chevelu¹,

Lolive²

2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Damien Lolive

Improving TTS with Corpus-Specific Pronunciation Adaptation

Can We Generate Emotional Pronunciations for Expressive Speech Synthesis?

Corpus Design Using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis

Probabilistic Speaker Pronunciation Adaptation for Spontaneous Speech Synthesis Using Linguistic Features

Do not build your TTS training corpus randomly

Contact Info

Product

Resources

About