2008
DOI: 10.1109/icassp.2008.4518698
|View full text |Cite
|
Sign up to set email alerts
|

LSF mapping for voice conversion with very small training sets

Abstract: To make voice conversion usable in practical applications, the number of training sentences should be minimized. With traditional Gaussian mixture model (GMM) based techniques small training sets lead to over-fitting and estimation problems. We propose a new approach for mapping line spectral frequencies (LSFs) representing the vocal tract. The idea is based on inherent intra-frame correlations of LSFs. For each target LSF, a separate GMM is used and only the source and target LSF elements best correlating wit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 7 publications
0
7
0
Order By: Relevance
“…This process will increase the energy of signal at higher frequency [3] as shown in figure (2). ) ( * ) ( ) ( 2 n s a n s n s (1) where s(n) is the speech signal , s 2 (n) is the output signal and the value of a is usually between 0.9 and 1.0. The ztransform of the filter is …”
Section: A Preemphasismentioning
confidence: 99%
See 1 more Smart Citation
“…This process will increase the energy of signal at higher frequency [3] as shown in figure (2). ) ( * ) ( ) ( 2 n s a n s n s (1) where s(n) is the speech signal , s 2 (n) is the output signal and the value of a is usually between 0.9 and 1.0. The ztransform of the filter is …”
Section: A Preemphasismentioning
confidence: 99%
“…Examples of such features include MFCCs (Mel Frequency Cepstral Coefficients) and LSFs (Line Spectral Frequencies) [1]. The aim of this paper is to extract the MFCC, and use them by the GMM for voice conversion of Arabic spoken words.…”
Section: Introductionmentioning
confidence: 99%
“…The Line Spectral Frequencies (LSF) were selected as the representation of the vocal characteristics of source and target speakers due to their favorable interpolation properties and stableness [16] . A 20 ms length Hanning window overlapped by 10 ms was used to compute and extract the LPC parameters.…”
Section: Objective Evaluationmentioning
confidence: 99%
“…The joint density Gaussian mixture model (JD-GMM) [4], [5], [6] is one of the most effective approaches. Unfortunately, it requires relatively large parallel training data to avoid over-fitting [8].…”
Section: Introductionmentioning
confidence: 99%