Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Charpentier, F.; Moulines, Éric

doi:10.21437/eurospeech.1989-172

Cited by 99 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A phrase-level concatenative synthesizer was designed for the PEGASUS in much the same way as was done for the WHEELS system. For example, flight numbers can be synthesized from smaller constituents (e.g., 4695 can be synthesized from 3[4]7, 7 [6]41, 8 [9]2, and 56 [5].) Multiple carrier phrases were designed for the speaking of estimated arrival and departure times, for example.…”

Section: Full Sentence Experimentsmentioning

confidence: 99%

See 1 more Smart Citation

Natural-sounding speech synthesis using variable-length units

1998

5th International Conference on Spoken Language Processing (ICSLP 1998)

View full text Add to dashboard Cite

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create natural sounding speech. Our initial work in this area showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with wordand phraselevel concatenation. In order to extend the flexibility of this framework, subsequent work focused on the problem of generating novel words from a pre-recorded corpus of sub-word units. The design of the sub-word units was motivated by perceptual experiments that investigated where speech could be spliced with minimal distortion and what contextual constraints were necessary to maintain in order to produce natural sounding speech. This sub-word corpus is then searched at synthesis time with a Viterbi search which selects a sequence of units based on how well they individually match the input specification and on how well they sound as an ensemble. This concatenative speech synthesis system, ENVOICE, has been used in a conversational information retrieval system in two application domains to convert meaning representations into speech waveforms.

show abstract

Section: Full Sentence Experimentsmentioning

confidence: 99%

“…This can be corrected after the fact by prosodic modification algorithms. An example of an algorithm which happens to operate in the time domain is the Time-Domain Pitch-Synchronous Overlap-and-Add algorithm (TD-PSOLA) [5,14].…”

Section: Introductionmentioning

confidence: 99%

Natural-sounding speech synthesis using variable-length units

1998

5th International Conference on Spoken Language Processing (ICSLP 1998)

View full text Add to dashboard Cite

show abstract

“…In the other end of the synthesizer continua we have the PSOLA type of method (Carpentier & Moulines, 1989). The algorithms are based on a pitch-synchronous overlap-add approach for modifying the speech prosody and concatenating diphone waveforms.…”

Section: Synthesizers and Control Parametersmentioning

confidence: 99%

Synthesis: modelling variability and constraints

Carlson¹

1991

2nd European Conference on Speech Communication and Technology (Eurospeech 1991)

View full text Add to dashboard Cite

“…This is because a previous study reported that PSOLA conducted directly on speech waveforms sometimes causes spectral distortion and leads to the speech quality degradation [1]. And this spectral distortion is thought to be able to be suppressed by doing the re-arrangement on source waveforms [4]. In our previous study [3], source waveforms were obtained by using LMA (Log Magnitude Approximation) inverse filter, which was designed only by using cepstrum coefficients and could precisely approximate magnitude characteristics of vocal tract in a logarithmic scale [5].…”

Section: Introductionmentioning

confidence: 99%

Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion

Minematsu¹,

Nakagawa²

2000

6th International Conference on Spoken Language Processing (ICSLP 2000)

View full text Add to dashboard Cite

This paper discusses two issues of the quality improvement of F 0 modified speech based upon PSOLA analysissynthesis. Previous studies [1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to undesired acoustic events. In this paper, several methods are experimentally examined to reduce pitch pulse detection errors. Even when the detection is done correctly, F 0 modified re-synthesized speech sometimes causes "echoes" in the re-arranged waveforms. This is mainly caused by a pitch pulse with small sharpness or by that with two relatively high pulses, not pitch pulses, before and after it. To suppress the echoes with little loss of naturalness, partial zero/π-phase conversion is proposed here. Experiments show the high validity of the proposed methods in improving the quality of re-synthesized speech.

show abstract

Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones

Cited by 99 publications

References 0 publications

Natural-sounding speech synthesis using variable-length units

Natural-sounding speech synthesis using variable-length units

Synthesis: modelling variability and constraints

Quality improvement of PSOLA analysis-synthesis using partial zero-phase conversion

Contact Info

Product

Resources

About