2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 2007
DOI: 10.1109/icassp.2007.366962
|View full text |Cite
|
Sign up to set email alerts
|

Towards a Voice Conversion System Based on Frame Selection

Abstract: The subject of this paper is the conversion of a given speaker's voice (the source speaker) into another identified voice (the target one). We assume we have at our disposal a large amount of speech samples from source and target voice with at least a part of them being parallel. The proposed system is built on a mapping function between source and target spectral envelopes followed by a frame selection algorithm to produce final spectral envelopes. Converted speech is produced by a basic LP analysis of the so… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(24 citation statements)
references
References 10 publications
(16 reference statements)
0
24
0
Order By: Relevance
“…S1 A simplified frame-selection-based [12], [13] voiceconversion algorithm, in which converted speech is generated from the selection of target speech frames. For computational efficiency, target frames are selected without taking the inter-frame joint cost into account.…”
Section: A Spoofing-attack Algorithmsmentioning
confidence: 99%
“…S1 A simplified frame-selection-based [12], [13] voiceconversion algorithm, in which converted speech is generated from the selection of target speech frames. For computational efficiency, target frames are selected without taking the inter-frame joint cost into account.…”
Section: A Spoofing-attack Algorithmsmentioning
confidence: 99%
“…Because one rich context model corresponds to one joint feature vector, the proposed processes are related to samplebased voice conversion [28]. The target cost and concatenation cost of the sample-based approach are regarded as the likelihoods for the static and dynamic parameters [29], [30].…”
Section: Discussionmentioning
confidence: 99%
“…For the source feature sequence , the target model and the arbitrarily selected HMM state sequence , the loglikelihood of the target feature sequence is given by (15) where The optimal transformed sequence is then given by (16) where is the set of candidates for and denotes the set of all possible HMM state sequences.…”
Section: Feature Selectionmentioning
confidence: 99%
“…Recently, a unit-selection based approach, which was originally devised for implementing the corpusbased concatenative text-to-speech (TTS) systems [20] was used to both alter the VTF parameters [13,14,16] and predict the target LP-residuals [15] . This paper is an extension of our previous work on voice transformation [12] based on a statistical approach.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation