Natural-sounding speech synthesis using variable-length units

Yi, Jon Rong-Wei

doi:10.21437/icslp.1998-575

Cited by 34 publications

(5 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The past research revealed that more natural-sounding speech is obtained if prosodic information is included in unit selection [7]. However, signal processing techniques employed to do prosodic modifications were reported to reduce the quality of synthesized speech [1,8,29].…”

Section: Source Of Prosody In Concatenative Ttsmentioning

confidence: 99%

“…Past research has shown that natural-sounding synthetic speech can be produced by selecting non-uniform units (i.e. units of variable length) from large speech databases [1,2,6,29,17,20]. Studies [2,6,29,25] indicate that the naturalness of synthetic speech can be improved by excising longer sequences of recorded speech from the database, this reduces the number of concatenation points in a synthesized utterance.…”

Section: Introductionmentioning

confidence: 99%

“…units of variable length) from large speech databases [1,2,6,29,17,20]. Studies [2,6,29,25] indicate that the naturalness of synthetic speech can be improved by excising longer sequences of recorded speech from the database, this reduces the number of concatenation points in a synthesized utterance. It was argued by [26] that longer chunks of recorded speech preserve natural rhythm and prosody better than shorter sequences.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic construction of a prosodically rich text corpus for speech synthesis systems

Lambert

2006

Speech Prosody 2006

View full text Add to dashboard Cite

This paper presents a method for an automatic compilation of a phonologically rich text database, which is used in a concatenative text-to-speech (TTS) synthesis system. In this method, linguistic features are predicted from text using Festival's linguistic engine. A set of phonological units for a specific text is compiled from attribute value lists (AVLs). Phrases/sentences that contain the phonological units that are not included in the database are added to the database. This is an efficient way for generating database prompts with a specific prosodic content; the prompts can then be recorded and converted into voice. The method described here can be used for languages other than English.

show abstract

Section: Source Of Prosody In Concatenative Ttsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic construction of a prosodically rich text corpus for speech synthesis systems

Lambert

2006

Speech Prosody 2006

View full text Add to dashboard Cite

show abstract

“…The original synthesis corpus [6] was converted by lowering the fundamental frequency by 30% and the spectrum (formants) by 25%. Thus, the spectral envelope was interpolated by a factor of ¦ ¦ )( 10 )2 ¤ ( 3 .…”

Section: Male Female Conversionmentioning

confidence: 99%

Voice transformations: from speech synthesis to mammalian vocalizations

Tang

Wang²,

Seneff

2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

View full text Add to dashboard Cite

This paper describes a phase vocoder based technique for voice transformation. This method provides a flexible way to manipulate various aspects of the input signal, e.g., fundamental frequency of voicing, duration, energy, and formant positions, without explicit £ ¥¤ extraction. The modifications to the signal can be specific to any feature dimensions, and can vary dynamically over time.There are many potential applications for this technique. In concatenative speech synthesis, the method can be applied to transform the speech corpus to different voice characteristics, or to smooth any pitch or formant discontinuities between concatenation boundaries. The method can also be used as a tool for language learning. We can modify the prosody of the student's own speech to match that from a native speaker, and use the result as guidance for improvements. The technique can also be used to convert other biological signals, such as killer whale vocalizations, to a signal that is more appropriate for human auditory perception. Our initial experiments show encouraging results for all of these applications.

show abstract

“…We focus on Cantonese, a major Chinese dialect predominant in Hong Kong, South China and many overseas Chinese communities. The corpus-based concatenation technique has been gaining popularity in speech synthesis [2][3][4][5][6] due to its ability to achieve a high degree of naturalness. The use of corpus-based syllable concatenation is particularly suitable for Chinese, since the language is monosyllabic in nature.…”

Section: Introductionmentioning

confidence: 99%

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects

Meng¹,

Keung²,

Siu³

et al. 2002

7th International Conference on Spoken Language Processing (ICSLP 2002)

View full text Add to dashboard Cite

This paper describes CU VOCAL, a Chinese text-to-speech synthesis system that adopts the approach of corpus-based syllable concatenation. We have demonstrated the applicability of the approach primarily for Cantonese, a major dialect of Chinese predominant in Hong Kong, South China and many overseas Chinese communities. This work extends our previous work as described in [1]. Our approach is able to synthesize speech from free-form text, and it can also be optimized for response generation in specific application domains. We have also demonstrated the portability of the approach to Putonghua, the official Chinese dialect, in a domain-optimized setting. Coarticulatory context is expressed in terms of distinctive features. Tonal context is also included. We conducted a series of listening tests using CU VOCAL, which gave favorable performance.

show abstract

Natural-sounding speech synthesis using variable-length units

Cited by 34 publications

References 22 publications

Automatic construction of a prosodically rich text corpus for speech synthesis systems

Automatic construction of a prosodically rich text corpus for speech synthesis systems

Voice transformations: from speech synthesis to mammalian vocalizations

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects

Contact Info

Product

Resources

About