Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1681
|View full text |Cite
|
Sign up to set email alerts
|

Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion

Abstract: Lombard speech is a speaking style associated with increased vocal effort that is naturally used by humans to improve intelligibility in the presence of noise. It is hence desirable to have a system capable of converting speech from normal to Lombard style. Moreover, it would be useful if one could adjust the degree of Lombardness in the converted speech so that the system is more adaptable to different noise environments. In this study, we propose the use of recently developed Augmented cycleconsistent advers… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…Wang et al, 2018) and have also started to be applied to speech transformations. For example, GANs were recently used to transform a voice into its Lombard counterpart (a particular type of vocal effort which makes the voice more intelligible in background noise; Seshadri, Juvela, Alku, & Räsänen, 2019). All such advances open exciting new possibilities to create emotional voice and speech transformations, which will certainly find their way to the community in the upcoming years.…”
Section: A Prospective Note On Deep-learning Techniquesmentioning
confidence: 99%
See 1 more Smart Citation
“…Wang et al, 2018) and have also started to be applied to speech transformations. For example, GANs were recently used to transform a voice into its Lombard counterpart (a particular type of vocal effort which makes the voice more intelligible in background noise; Seshadri, Juvela, Alku, & Räsänen, 2019). All such advances open exciting new possibilities to create emotional voice and speech transformations, which will certainly find their way to the community in the upcoming years.…”
Section: A Prospective Note On Deep-learning Techniquesmentioning
confidence: 99%
“…Finally, generative adversarial networks (GANs), a special class of DNN architecture capable of learning a deterministic mapping from one style of stimulus to another (Goodfellow et al, 2014), are increasingly used to create visual transformations (e.g., smiles; W. Wang et al, 2018) and have also started to be applied to speech transformations. For example, GANs were recently used to transform a voice into its Lombard counterpart (a particular type of vocal effort which makes the voice more intelligible in background noise; Seshadri, Juvela, Alku, & Räsänen, 2019). All such advances open exciting new possibilities to create emotional voice and speech transformations, which will certainly find their way to the community in the upcoming years.…”
Section: A Prospective Note On Deep-learning Techniquesmentioning
confidence: 99%
“…To overcome this, deep neural network approaches were implemented where the robustness of acoustic modeling is improved by efficient mapping between linguistic and acoustic features. Inspired by the success of adversarial generative models, Cycle-consistent adversarial networks (CycleGANs) showed promising results in terms of speech quality and the magnitude of the perceptual change between speech styles [11,12]. An extension to recurrent neural networks and particularly long short-term memory networks (LSTMs) were proposed that it successfully adapted normal speaking style to Lombard style [13].…”
Section: Introductionmentioning
confidence: 99%
“…Inspired by human speech production characteristics, some algorithms (e.g., [13], [14], [15]) aim to convert normal speech to Lombard speech [16], which is naturally produced by speakers with increased vocal effort for higher intelligibility. To achieve speaking style conversion, most algorithms rely on vocoder-based analysis-and-synthesis techniques, where vocoder features are transformed to fit in the Lombard style.…”
Section: Introductionmentioning
confidence: 99%
“…To achieve speaking style conversion, most algorithms rely on vocoder-based analysis-and-synthesis techniques, where vocoder features are transformed to fit in the Lombard style. For example, Seshadri et al [15] modified Mel-generalized cepstrum coefficients [17] of input speech to generate the Lombard-style speech by using log-domain pulse model vocoder [18]. However, using such a parametric vocoder inevitably degrades the converted speech quality.…”
Section: Introductionmentioning
confidence: 99%