ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682648
|View full text |Cite
|
Sign up to set email alerts
|

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Abstract: Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterance… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 28 publications
(40 reference statements)
0
11
0
Order By: Relevance
“…The loss function of an augmented CycleGAN can be formulated similar to that of a standard CycleGAN used in [10]. In our implementation, we use the Wasserstein distance metric (WGAN loss) with gradient penalty [16] to determine the adverserial loss, defined as…”
Section: Augmented Cycleganmentioning
confidence: 99%
See 3 more Smart Citations
“…The loss function of an augmented CycleGAN can be formulated similar to that of a standard CycleGAN used in [10]. In our implementation, we use the Wasserstein distance metric (WGAN loss) with gradient penalty [16] to determine the adverserial loss, defined as…”
Section: Augmented Cycleganmentioning
confidence: 99%
“…where λcyc and λ id control the relative importance of the cyclic reconstruction loss and the identity mapping loss respectively. Following our previous works [9,10] on SSC, the current study uses a parametric system utilizing frame level Pulse Model in Log domain (PML [18]) vocoder features for the normal-to-Lombard conversion. The basic block diagram is shown in Figure 2.…”
Section: Augmented Cycleganmentioning
confidence: 99%
See 2 more Smart Citations
“…Right after the GAN was invented [16], many GAN-based architectures actively used for VC [7,[17][18][19][20][21][22]. Recently, Cycleconsistent Adversarial Network (CycleGAN) gives the state-ofthe-art results in VC [23][24][25][26]. While other variants of GANs, such as MMSE DiscoGAN proposed in [27] is also shown significant results for WHSP2SPCH conversion task, in particular, for prediction of F0.…”
Section: Introductionmentioning
confidence: 99%