9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-22
|View full text |Cite
|
Sign up to set email alerts
|

Novel Pre-processing using Outlier Removal in Voice Conversion

Abstract: Voice conversion (VC) technique modifies the speech utterance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It finds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers. Until now, there has been ver… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2018
2018

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…Recently, equalizing formant locations using Dynamic Frequency Warping (DFW) was proposed to tackle these issue [6]. In addition, some of the approaches proposed to filter out such pairs from the training [7,8]. However, loosing number of pairs will not be useful in the case where the amount of training data is small.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, equalizing formant locations using Dynamic Frequency Warping (DFW) was proposed to tackle these issue [6]. In addition, some of the approaches proposed to filter out such pairs from the training [7,8]. However, loosing number of pairs will not be useful in the case where the amount of training data is small.…”
Section: Introductionmentioning
confidence: 99%
“…Stand-alone VC techniques that are based on Gaussian Mixture Model (GMM) [2,3], frequency warping (FW) [4,5], exemplar [6] and Deep Neural Network (DNN) [7][8][9] requires the aligned spectral features before learning the mapping function. In the VC literature, it has been shown that the alignment accuracy clearly affects the quality of converted speech signal [10][11][12]. Hence, the accurate aligned spectral features from both the source and the target speakers' training speech database are required.…”
Section: Introductionmentioning
confidence: 99%