2007
DOI: 10.1109/tasl.2007.907344
|View full text |Cite
|
Sign up to set email alerts
|

Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
712
0
2

Year Published

2012
2012
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 866 publications
(722 citation statements)
references
References 30 publications
8
712
0
2
Order By: Relevance
“…After applying a weighting matrix W [3] to an input speech parameter sequence x = [x 1 , · · · , x T ] for calculating its static-dynamic speech feature sequence, the DNNs predict a static-dynamic speech feature sequence of the converted speech.ŷ is generated from the static-dynamic features by using the maximum likelihood-based parameter generation algorithm [2]. We define the above speech parameter conversion asŷ = G(x).…”
Section: Conventional Dnn-based Vcmentioning
confidence: 99%
See 2 more Smart Citations
“…After applying a weighting matrix W [3] to an input speech parameter sequence x = [x 1 , · · · , x T ] for calculating its static-dynamic speech feature sequence, the DNNs predict a static-dynamic speech feature sequence of the converted speech.ŷ is generated from the static-dynamic features by using the maximum likelihood-based parameter generation algorithm [2]. We define the above speech parameter conversion asŷ = G(x).…”
Section: Conventional Dnn-based Vcmentioning
confidence: 99%
“…Deep Neural Networks (DNNs) [1] have been used as acoustic models for VC because they can represent the relationship between the input and output speech parameters more accurately than conventional Gaussian mixture models [2]. These acoustic models are trained with training algorithms such as the maximum likelihood criterion [3] and Minimum Generation Error (MGE) criterion [4], [5].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our proposed enhancement system uses a statistical F 0 pattern prediction, which is a part of voice conversion techniques [10], [11], to predict F 0 patterns of normal speech from spectral features of EL speech. It consists of training and prediction processes as shown in Fig.…”
Section: Statistical F 0 Pattern Predictionmentioning
confidence: 99%
“…This is also based on the singing-to-singing synthesis approach and is an extension of VocaListener, which deals with only pitch and dynamics. Much previous work has been done on manipulating voice timbre such as speaking voice conversion [12,13], emotional speech synthesis [14][15][16], singing voice conversion [17], and singing voice morphing [18]. However, these approaches cannot deal with intentional temporal timbre changes during singing.…”
Section: Vocalistener2: Singing Synthesis System Imitating Voice Timbmentioning
confidence: 99%