Towards a Voice Conversion System Based on Frame Selection

Dutoit, Thierry; Holzapfel, André; Jottrand, Matthieu; Moinet, Alexis; Pérez, Javier; Stylianou, Yannis

doi:10.1109/icassp.2007.366962

Cited by 33 publications

(24 citation statements)

References 10 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…S1 A simplified frame-selection-based [12], [13] voiceconversion algorithm, in which converted speech is generated from the selection of target speech frames. For computational efficiency, target frames are selected without taking the inter-frame joint cost into account.…”

Section: A Spoofing-attack Algorithmsmentioning

confidence: 99%

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

Yamagishi

Kinnunen

et al. 2017

IEEE J. Sel. Top. Signal Process.

249

302

View full text Add to dashboard Cite

Abstract-Concerns regarding the vulnerability of automatic speaker verification (ASV) technology against spoofing can undermine confidence in its reliability and form a barrier to exploitation. The absence of competitive evaluations and the lack of common datasets has hampered progress in developing effective spoofing countermeasures. This paper describes the ASV Spoofing and Countermeasures (ASVspoof) initiative, which aims to fill this void. Through the provision of a common dataset, protocols, and metrics, ASVspoof promotes a sound research methodology and fosters technological progress. This paper also describes the ASVspoof 2015 dataset, evaluation, and results with detailed analyses. A review of post-evaluation studies conducted using the same dataset illustrates the rapid progress stemming from ASVspoof and outlines the need for further investigation. Priority future research directions are presented in the scope of the next ASVspoof evaluation planned for 2017.

show abstract

Section: A Spoofing-attack Algorithmsmentioning

confidence: 99%

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

Yamagishi

Kinnunen

et al. 2017

IEEE J. Sel. Top. Signal Process.

249

302

View full text Add to dashboard Cite

show abstract

“…Because one rich context model corresponds to one joint feature vector, the proposed processes are related to samplebased voice conversion [28]. The target cost and concatenation cost of the sample-based approach are regarded as the likelihoods for the static and dynamic parameters [29], [30].…”

Section: Discussionmentioning

confidence: 99%

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

Takamichi

Toda

Neubig

et al. 2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYThis paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMMbased VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech. key words: GMM-based voice conversion, sample-based speech synthesis, speech parameter conversion, rich context model

show abstract

“…For the source feature sequence , the target model and the arbitrarily selected HMM state sequence , the loglikelihood of the target feature sequence is given by (15) where The optimal transformed sequence is then given by (16) where is the set of candidates for and denotes the set of all possible HMM state sequences.…”

Section: Feature Selectionmentioning

confidence: 99%

“…Recently, a unit-selection based approach, which was originally devised for implementing the corpusbased concatenative text-to-speech (TTS) systems [20] was used to both alter the VTF parameters [13,14,16] and predict the target LP-residuals [15] . This paper is an extension of our previous work on voice transformation [12] based on a statistical approach.…”

Section: Introductionmentioning

confidence: 99%

“…Voice personality transformation [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] is a process by which voice personality is altered, so that one voice is made to sound like another. The process has numerous applications in a variety of areas such as personification of text-to-speech synthesis systems, preprocessing for speech recognition [17] , and enhancing the intelligibility of abnormal speech [8] .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Feature Selection-based Voice Transformation

Lee¹

2012

The Journal of the Acoustical Society of Korea

View full text Add to dashboard Cite

A voice transformation (VT) method that can make the utterance of a source speaker mimic that of a target speaker is described. Speaker individuality transformation is achieved by altering three feature parameters, which include the LPC cepstrum, pitch period and gain. The main objective of this study involves construction of an optimal sequence of features selected from a target speaker's database, to maximize both the correlation probabilities between the transformed and the source features and the likelihood of the transformed features with respect to the target model. A set of two-pass conversion rules is proposed, where the feature parameters are first selected from a database then the optimal sequence of the feature parameters is then constructed in the second pass. The conversion rules were developed using a statistical approach that employed a maximum likelihood criterion. In constructing an optimal sequence of the features, a hidden Markov model (HMM) was employed to find the most likely combination of the features with respect to the target speaker's model. The effectiveness of the proposed transformation method was evaluated using objective tests and informal listening tests. We confirmed that the proposed method leads to perceptually more preferred results, compared with the conventional methods.

show abstract

Towards a Voice Conversion System Based on Frame Selection

Cited by 33 publications

References 10 publications

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

ASVspoof: The Automatic Speaker Verification Spoofing and Countermeasures Challenge

A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models

Feature Selection-based Voice Transformation

Contact Info

Product

Resources

About