An Objective Evaluation Framework for Pathological Speech Synthesis

Halpern, Bence Mark; Fritsch, Julian; Hermann, Enno; Son, R.J.J.H. van; Scharenborg, Odette; Magimai-Doss, Mathew

doi:10.48550/arxiv.2107.00308

Cited by 1 publication

(1 citation statement)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are two previous works that focus on VC for clinical usage. The diagram on the left of Figure 1a depicts an N2D VC system presented in [5], which was a combination of a CycleGAN-based frame-wise VC model and a PSOLA-based speech rate modification process. This method suffers from the same issues as those in Section 2.1, including audible vocoder artifacts brought by the extra PSOLA operation, and the inability to preserve the speaker identity of the control speaker.…”

Section: Normal-to-dysarthric Vc For Clinical Usagementioning

confidence: 99%

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Huang¹,

Halpern²,

Violeta³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker's voice was limited and requires further improvements.

show abstract

Section: Normal-to-dysarthric Vc For Clinical Usagementioning

confidence: 99%