Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs

López, Ana Ramírez; Seshadri, Shreyas; Juvela, Lauri; Räsänen, Okko; Alku, Paavo

doi:10.21437/interspeech.2017-400

Cited by 17 publications

(13 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the baseline system with parallel data, a standard GMM is used as it was shown to compare well against DNNs and non-parametric Bayesian methods in an earlier study with the present data set [14].…”

Section: Parallel Gmm Learningmentioning

confidence: 87%

“…Moreover, the amount of parallel training data, where the utterances in the source and target styles are from the same speaker speaking the same linguistic content, is limited. Our earlier work [14,15] also suggest that the limited availability of parallel data in normal and Lombard styles causes a bottleneck in system performance. This encourages the use of nonparallel mapping models within the parametric SSC framework.…”

Section: Introductionmentioning

confidence: 92%

“…Lombard speech corresponds to a speaking style that talkers naturally employ in noisy environments to improve intelligibility. We use a data driven parametric setup that uses a vocoder to extract speech features (see [14]). These features are mapped This study was funded by Academy of Finland grant nos.…”

Section: Introductionmentioning

confidence: 99%

“…Given this background, the overall goal of the current paper is to study the applicability of CycleGANs for the task of vocal effort based SSC (normal vs. Lombard), and to compare it to the standard INCA-based non-parallel approach and to our previous baseline system utilizing parallel data [14]. The systems are compared using subjective listening tests evaluating the success of style conversion and overall quality of the converted speech.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Seshadri

Juvela

Yamagishi

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.

show abstract

Section: Parallel Gmm Learningmentioning

confidence: 87%

Section: Introductionmentioning

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Seshadri

Juvela

Yamagishi

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…SSC has been previously studied in whisper-to-normal conversion [3][4][5] and in normal-to-Lombard conversion [6][7][8]. In addition, a parametric approach to normal-to-Lombard SSC was recently explored in [9], where a vocoder was used to extract frame level features that were then transformed from normal to Lombard style using parallel data-driven mapping models, and then synthesized as speech in the target style using the same vocoder.…”

Section: Introductionmentioning

confidence: 99%

Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion

Seshadri¹,

Juvela²,

Alku³

et al. 2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Lombard speech is a speaking style associated with increased vocal effort that is naturally used by humans to improve intelligibility in the presence of noise. It is hence desirable to have a system capable of converting speech from normal to Lombard style. Moreover, it would be useful if one could adjust the degree of Lombardness in the converted speech so that the system is more adaptable to different noise environments. In this study, we propose the use of recently developed Augmented cycleconsistent adversarial networks (Augmented CycleGANs) for conversion between normal and Lombard speaking styles. The proposed system gives a smooth control on the degree of Lombardness of the mapped utterances by traversing through different points in the latent space of the trained model. We utilize a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract features from normal speech that are then mapped to Lombard-style features using the Augmented CycleGAN. Finally, the mapped features are converted to Lombard speech with PML. The model is trained on multi-language data recorded in different noise conditions, and we compare its effectiveness to a previously proposed CycleGAN system in experiments for intelligibility and quality of mapped speech.

show abstract

Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model

Zhang

et al. 2018

MultiMedia Modeling

View full text Add to dashboard Cite

Speaking Style Conversion from Normal to Lombard Speech Using a Glottal Vocoder and Bayesian GMMs

Cited by 17 publications

References 21 publications

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Augmented CycleGANs for Continuous Scale Normal-to-Lombard Speaking Style Conversion

Spectral Tilt Estimation for Speech Intelligibility Enhancement Using RNN Based on All-Pole Model

Contact Info

Product

Resources

About