Probablistic modelling of F0 in unvoiced regions in HMM based speech synthesis

Yu, Kai; Toda, Tomoki; Gašić, Milica; Keizer, Simon; Mairesse, François; Thomson, Blaise; Young, Steve

doi:10.1109/icassp.2009.4960448

Cited by 18 publications

(22 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…By generating real F0 values for unvoiced regions and assuming hidden voicing labels, the Continuous F0 model with Globally Tied Distribution [6], CF-GTD in figure 1(b), is obtained. The state output distribution can be expressed as…”

Section: Comparison Of F0 Modelling Approaches For Hmm Based Speech Smentioning

confidence: 99%

“…Experiments have shown that it can greatly reduce the F0 trajectory modelling error, and consequently improve the naturalness of the synthesised speech [6,7]. However, due to hidden voicing labels, voicing classification only relies on the statistical difference between the globally tied unvoiced component and the state specific voiced component.…”

Section: Comparison Of F0 Modelling Approaches For Hmm Based Speech Smentioning

confidence: 99%

“…Although different unvoiced F0 value generation approaches have in the past been shown to give similar performance [6,7], all methods have been static. To reflect the randomness property of unvoiced F0, the dynamic random generation approach described above was investigated.…”

Section: Subjective Comparisonmentioning

confidence: 99%

“…Here, continuous F0 is assumed to exist in unvoiced regions and there have been a number of modelling approaches along this line. In [6], random F0 values are used in unvoiced regions and voicing labels are assumed to be hidden. A Gaussian mixture model (GMM) is employed, where unvoiced Gaussian components are globally tied so that the statistical difference between voiced and unvoiced regions can be modelled.…”

Section: Introductionmentioning

confidence: 99%

“…In [7], voicing labels are assumed to be observable and modelled in an independent stream. As the voicing labels are explicitly modelled, global tying as defined in [6] is no longer a requirement for distinguishing voiced regions from unvoiced regions. Both approaches have shown significant improvement in the naturalness of synthesised speech compared to the traditional MSDHMM approach.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Joint modelling of voicing label and continuous F0 for HMM based speech synthesis

Young

2011

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Fundamental frequency, or F0 is critical for high quality speech synthesis in HMM based speech synthesis. Traditionally, F0 values are considered to depend on a binary voicing decision such that they are continuous in voiced regions and undefined in unvoiced regions. Multi-space distribution HMM (MSDHMM) has been used for modelling the discontinuous F0. Recently, a continuous F0 modelling framework has been proposed and shown to be effective, where continuous F0 observations are assumed to always exist and voicing labels are explicitly modelled by an independent stream. In this paper, a refined continuous F0 modelling approach is proposed. Here, F0 values are assumed to be dependent on voicing labels and both are jointly modelled in a single stream. Due to the enforced dependency, the new method can effectively reduce the voicing classification error. Subjective listening tests also demonstrate that the new approach can yield significant improvements on the naturalness of the synthesised speech. A dynamic random unvoiced F0 generation method is also investigated. Experiments show that it has significant effect on the quality of synthesised speech.

show abstract

Section: Comparison Of F0 Modelling Approaches For Hmm Based Speech Smentioning

confidence: 99%

Section: Comparison Of F0 Modelling Approaches For Hmm Based Speech Smentioning

confidence: 99%

Section: Subjective Comparisonmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations