Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1038
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors

Abstract: The paper presents a mechanism to perform speaker adaptation in speech synthesis based on deep neural networks (DNNs). The mechanism extracts speaker identification vectors, socalled d-vectors, from the training speakers and uses them jointly with the linguistic features to train a multi-speaker DNNbased text-to-speech synthesizer (DNN-TTS). The d-vectors are derived by applying principal component analysis (PCA) on the bottleneck features of a speaker classifier network. At the adaptation stage, three variant… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
31
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 37 publications
(31 citation statements)
references
References 22 publications
0
31
0
Order By: Relevance
“…Our future work includes comparing our method with other adaptation methods such as LHUC and SVD bottleneck speaker adaptation with low-rank approximation. Another interesting experiment we would like to see is the use of i-vector or d-vector [24] as a scaling code.…”
Section: Discussionmentioning
confidence: 99%
“…Our future work includes comparing our method with other adaptation methods such as LHUC and SVD bottleneck speaker adaptation with low-rank approximation. Another interesting experiment we would like to see is the use of i-vector or d-vector [24] as a scaling code.…”
Section: Discussionmentioning
confidence: 99%
“…As mentioned earlier, several techniques for speaker adaptation using i-vectors [5] or d-vectors [15] have been developed. As for the former, i-vectors are directly used as inputs for DNN-based speech synthesis.…”
Section: Advantage Of Proposed Frameworkmentioning
confidence: 99%
“…An unsupervised speaker-adaptation technique using a bottle-neck layer of a DNN-based speaker-recognition model for DNN-based speech synthesis was proposed by Doddipatla et al [15]. As for this technique, PCA is applied to the bottle-neck features of the DNN-based speaker recognition, and the first eigenvector is interpolated on the basis of the posterior probabilities of the speaker-recognition model.…”
Section: Advantage Of Proposed Frameworkmentioning
confidence: 99%
“…An effective way to solve this problem is to use a technique like speaker adaptation [16]- [20], in which a baseline model is trained using a large database, then adjusted to a target speaker using only a small amount of data. This approach can similarly be applied to expressiveness tasks through emotion transplantation, i.e.…”
Section: Introductionmentioning
confidence: 99%