Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding

Peiró-Lilja, Alex; Farrús, Mireia

doi:10.21437/interspeech.2020-1788

Cited by 4 publications

(1 citation statement)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Phonemic transcription is also essential in speech recognition systems, where the models generally learn representations of the speech signal at phone-level (Zeineldeen et al 2020). For TTS systems, the complete lexical annotation of the orthographic transcript is essential, and many recent studies augment the text input with this annotation and, as a result, enhance the naturalness and adequacy of the output speech (Peiró-Lilja and Farrús 2020; Taylor and Richmond 2020).…”

Section: Introductionmentioning

confidence: 99%

RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information

et al. 2022

View full text Add to dashboard Cite

In this article, we introduce an extended, freely available resource for the Romanian language, named RoLEX. The dataset was developed mainly for speech processing applications, yet its applicability extends beyond this domain. RoLEX includes over 330,000 curated entries with information regarding lemma, morphosyntactic description, syllabification, lexical stress and phonemic transcription. The process of selecting the list of word entries and semi-automatically annotating the complete lexical information associated with each of the entries is thoroughly described. The dataset’s inherent knowledge is then evaluated in a task of concurrent prediction of syllabification, lexical stress marking and phonemic transcription. The evaluation looked into several dataset design factors, such as the minimum viable number of entries for correct prediction, the optimisation of the minimum number of required entries through expert selection and the augmentation of the input with morphosyntactic information, as well as the influence of each task in the overall accuracy. The best results were obtained when the orthographic form of the entries was augmented with the complete morphosyntactic tags. A word error rate of 3.08% and a character error rate of 1.08% were obtained this way. We show that using a carefully selected subset of entries for training can result in a similar performance to the performance obtained by a larger set of randomly selected entries (twice as many). In terms of prediction complexity, the lexical stress marking posed most problems and accounts for around 60% of the errors in the predicted sequence.

show abstract

Section: Introductionmentioning

confidence: 99%

RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information

et al. 2022

View full text Add to dashboard Cite

show abstract

An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis

Lorincz

Stan

Giurgiu

2021

Procedia Computer Science

View full text Add to dashboard Cite

Características prosódicas associadas aos sinais de pontuação

Galdino

Silva

Oliveira

2021

CadLin

View full text Add to dashboard Cite

O objetivo deste artigo é apresentar uma revisão de escopo sobre as características prosódicas associadas aos sinais de pontuação. Foi realizado um levantamento bibliográfico a partir da pesquisa de descritores em inglês e português, organizados de acordo com a seguinte sintaxe: prosódia AND acústica AND discurso AND estrutura AND ("sinais de pontuação" OR "pontuação gráfica" OR "sinal de pontuação"), sem incluir citações e patentes nas bases de dados: OvidMedlin, Public Medicine Library (PubMed), Scopus (Elsevier), Ebscohost (Academic Search Premier), Gale Academic Online e Google Scholar. Observamos que existe uma diversidade de métodos empregados para analisar a correlação entre os sinais de pontuação e as características prosódicas. Os estudos desta revisão confirmaram nossa pergunta de pesquisa, evidenciando a relação entre os sinais de pontuação e os aspectos prosódicos. A maioria dos trabalhos relacionados à tecnologia desenvolveu diferentes redes neurais para transformar texto em fala e/ou para converter fala em texto e mostrou que as pausas são apontadas como indicadores mais fortes dos sinais de pontuação.

show abstract

Naturalness Enhancement with Linguistic Information in End-to-End TTS Using Unsupervised Parallel Encoding

Cited by 4 publications

References 15 publications

RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information

RoLEX: The development of an extended Romanian lexical dataset and its evaluation at predicting concurrent lexical information

An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis

Características prosódicas associadas aos sinais de pontuação

Contact Info

Product

Resources

About