2020
DOI: 10.1016/j.specom.2020.05.004
|View full text |Cite
|
Sign up to set email alerts
|

DeepConversion: Voice conversion with limited parallel training data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2
1

Relationship

3
7

Authors

Journals

citations
Cited by 20 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…Mel-cepstral distortion (MCD) [246] is commonly used to measure the difference between two spectral features [62], [67], [256], [257]. It is calculated between the converted and target Mel-cepstral coefficients, or MCEPs, [258], [259], y and y.…”
Section: A Objective Evaluation 1) Spectrum Conversionmentioning
confidence: 99%
“…Mel-cepstral distortion (MCD) [246] is commonly used to measure the difference between two spectral features [62], [67], [256], [257]. It is calculated between the converted and target Mel-cepstral coefficients, or MCEPs, [258], [259], y and y.…”
Section: A Objective Evaluation 1) Spectrum Conversionmentioning
confidence: 99%
“…Text-based approaches that use ASR models have accurate linguistic information, are unlikely to be corrupted during voice conversion, and can even perform voice conversion between speakers of different languages if the ASR model used supports multiple languages. For example, DeepConversion utilizes an ASR model to perform voice conversion by mapping PPGs, speaker-dependent features, and Mel-Cepstral coefficients (MCEP) [25]. However, because a large amount of parallel data is required to train an ASR model to extract PPGs used in voice conversion, there may be inevitable errors in the process of extracting PPGs (owing to insufficient data) for training in a low-resource, multilingual environment.…”
Section: Introductionmentioning
confidence: 99%
“…Many state-of-the-art VC methods [23]- [25] have been proposed and implemented for parallel and non-parallel VC. It is possible to train the parallel VC in a limited dataset [26]. If the performance of VC is not precise enough, voice augmentation for VC is possible.…”
Section: Introductionmentioning
confidence: 99%