2010
DOI: 10.1587/transinf.e93.d.2483
|View full text |Cite
|
Sign up to set email alerts
|

HMM-Based Voice Conversion Using Quantized F0 Context

Abstract: Takashi NOSE†a) , Member, Yuhei OTA †b) , Nonmember, and Takao KOBAYASHI †c) , Member SUMMARYWe propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…One typical method is VocaListener [25] in singing voice synthesis, which is capable of automatically optimizing input manual parameters of a singing voice synthesizer using singing voices sung by a user. The VC method using HMM for speaking voices [26] may also be used for this purpose.…”
Section: Speech and Text Inputmentioning
confidence: 99%
“…One typical method is VocaListener [25] in singing voice synthesis, which is capable of automatically optimizing input manual parameters of a singing voice synthesizer using singing voices sung by a user. The VC method using HMM for speaking voices [26] may also be used for this purpose.…”
Section: Speech and Text Inputmentioning
confidence: 99%
“…To improve the accuracy, the authors introduce phone-duration prediction using random forests [6] which is a kind of ensemble training [7]. Finally, speech parameter generation with mora-based emphasis context is presented to preserve rich intonation of natural speech, which is a variation of quantized fundamental frequency (F0) context [8] used also in voice conversion [9] and very low bit-rate speech coding [10].…”
Section: Introductionmentioning
confidence: 99%