2016
DOI: 10.1587/transinf.2015edp7457
|View full text |Cite
|
Sign up to set email alerts
|

WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications

Abstract: SUMMARYA vocoder-based speech synthesis system, named WORLD, was developed in an effort to improve the sound quality of realtime applications using speech. Speech analysis, manipulation, and synthesis on the basis of vocoders are used in various kinds of speech research. Although several high-quality speech synthesis systems have been developed, real-time processing has been difficult with them because of their high computational costs. This new speech synthesis system has not only sound quality but also quick… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
597
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 976 publications
(646 citation statements)
references
References 33 publications
0
597
0
3
Order By: Relevance
“…We use an acoustic frontend based on the WORLD vocoder [21] (D4C edition [22]) with a 32 kHz sample rate and 5 ms hop time. The dimensionality of the harmonic component is reduced to 60 log Mel-Frequency Spectral Coefficients (MFSCs) by truncated frequency warping in the cepstral domain [23] with an all-pole filter with warping coefficient α = 0.45.…”
Section: Acoustic and Control Frontendmentioning
confidence: 99%
“…We use an acoustic frontend based on the WORLD vocoder [21] (D4C edition [22]) with a 32 kHz sample rate and 5 ms hop time. The dimensionality of the harmonic component is reduced to 60 log Mel-Frequency Spectral Coefficients (MFSCs) by truncated frequency warping in the cepstral domain [23] with an all-pole filter with warping coefficient α = 0.45.…”
Section: Acoustic and Control Frontendmentioning
confidence: 99%
“…Some vocoders have been investigated from a simple mel-log spectrum approximate (MLSA) filter with a simple pulse excitation and melcepstrum [1] to high-quality ones, such as STRAIGHT [2] and WORLD [3]. However, these high-quality vocoders are intended to analyze and convert high-quality speech and a number of acoustic parameters necessary to synthesize speech with the same quality as the original, but not for TTS.…”
Section: Introductionmentioning
confidence: 99%
“…DNN improves synthesis accuracy compared to the conventional hidden Markov model (HMM) [5,6]. Additionally corpus-dependent high-quality vocoders with DNNs have been investigated [7,8], whereas the conventional high-quality ones described above [2,3] are corpus-independent. Although corpus-dependent high-quality vocoders with DNNs improve the speech quality compared to the conventional STRAIGHT vocoder in both HMMand DNN-based speech synthesis [7], the synthesis quality depends greatly on the estimation accuracy of the glottal closure instants [9].…”
Section: Introductionmentioning
confidence: 99%
“…The speech signal sampling rate was 22,050 Hz. The WORLD [23,24] package was used in speech analysis. From a speech signal, 35-dimensional mel-cepstrum parameters including the 0th power coefficient, F0 values, and 513-dimensional aperiodicity features, which were coded into two-band aperiodicity parameters, were used.…”
Section: Experimental Conditionsmentioning
confidence: 99%