Towards expressive prosody generation in TTS for reading aloud applications

Domínguez, Mónica; Burga, Alicia; Farrús, Mireia; Wanner, Leo

doi:10.21437/iberspeech.2018-9

Cited by 2 publications

(2 citation statements)

References 14 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We highlight the relevance of exploring formal representations of thematicity, such as the MTT's and we foresee promising outcomes when used as basis for implementation of communicatively-oriented models in TTS and conversational agent applications. Preliminary experiments have been carried out to implement a thematicity-to-prosody module in English and German (Domínguez et al 2017(Domínguez et al , 2018. In this context, it should be, however, noted that these supervised classification experiments are different from an actual implementation of a prosody module in a TTS application.…”

Section: Discussionmentioning

confidence: 99%

“…For this purpose, we carry out machine learning-based classification experiments on a spoken language corpus, 3 which consists of an extract of 109 isolated sentences from the popular Wall Street Journal (WSJ) corpus (Charniak et al 2000), read aloud by native speakers of English. We opted for a reading-aloud setup because one of our applications is a "reading aloud" agent (Domínguez et al 2018), and deficiencies in expressive prosody in TTS become evident with the syntactically demanding genre of newspaper material. The sentences in our corpus are annotated with their thematicity structure (both MTT's tripartite hierarchical thematicity and the flat binary theme-rheme dichotomy, which constitutes the state of the art in speech technologies and which we use as the reference thematicity structure) and with their prosodic structure (in terms of acoustic parameter-oriented labels automatically derived from three prosodic elements, namely, F0, intensity and rhythm, and in terms of ToBI labels).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The Information Structure–prosody interface in text-to-speech technologies. An empirical perspective

Domínguez

Farrús

Wanner

2021

Corpus Linguistics and Linguistic Theory

Self Cite

View full text Add to dashboard Cite

The correspondence between the communicative intention of a speaker in terms of Information Structure and the way this speaker reflects communicative aspects by means of prosody have been a fruitful field of study in Linguistics. However, text-to-speech applications still lack the variability and richness found in human speech in terms of how humans display their communication skills. Some attempts were made in the past to model one aspect of Information Structure, namely thematicity for its application to intonation generation in text-to-speech technologies. Yet, these applications suffer from two limitations: (i) they draw upon a small number of made-up simple question-answer pairs rather than on real (spoken or written) corpus material; and (ii) they do not explore whether any other interpretation would better suit a wider range of textual genres beyond dialogs. In this paper, two different interpretations of thematicity in the field of speech technologies are examined: the state-of-art binary (and flat) theme-rheme, and the hierarchical thematicity defined by Igor Mel’čuk within the Meaning-Text Theory. The outcome of the experiments on a corpus of native speakers of US English suggests that the latter interpretation of thematicity has a versatile implementation potential for text-to-speech applications of the Information Structure–prosody interface.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The Information Structure–prosody interface in text-to-speech technologies. An empirical perspective

Domínguez

Farrús

Wanner

2021

Corpus Linguistics and Linguistic Theory

Self Cite

View full text Add to dashboard Cite

show abstract

Características prosódicas associadas aos sinais de pontuação

Galdino

Silva

Oliveira

2021

CadLin

View full text Add to dashboard Cite

O objetivo deste artigo é apresentar uma revisão de escopo sobre as características prosódicas associadas aos sinais de pontuação. Foi realizado um levantamento bibliográfico a partir da pesquisa de descritores em inglês e português, organizados de acordo com a seguinte sintaxe: prosódia AND acústica AND discurso AND estrutura AND ("sinais de pontuação" OR "pontuação gráfica" OR "sinal de pontuação"), sem incluir citações e patentes nas bases de dados: OvidMedlin, Public Medicine Library (PubMed), Scopus (Elsevier), Ebscohost (Academic Search Premier), Gale Academic Online e Google Scholar. Observamos que existe uma diversidade de métodos empregados para analisar a correlação entre os sinais de pontuação e as características prosódicas. Os estudos desta revisão confirmaram nossa pergunta de pesquisa, evidenciando a relação entre os sinais de pontuação e os aspectos prosódicos. A maioria dos trabalhos relacionados à tecnologia desenvolveu diferentes redes neurais para transformar texto em fala e/ou para converter fala em texto e mostrou que as pausas são apontadas como indicadores mais fortes dos sinais de pontuação.

show abstract

Towards expressive prosody generation in TTS for reading aloud applications

Cited by 2 publications

References 14 publications

The Information Structure–prosody interface in text-to-speech technologies. An empirical perspective

The Information Structure–prosody interface in text-to-speech technologies. An empirical perspective

Características prosódicas associadas aos sinais de pontuação

Contact Info

Product

Resources

About