A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation

Schnell, Bastian; Garner, Philip N.

doi:10.21437/interspeech.2018-1904

Cited by 3 publications

(3 citation statements)

References 24 publications

(28 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To this end, we use a DNN-based state-of-the-art Merlin TTS system in conjunction with the Festival front-end, two Bidirectional Long Short-Term Memory networks as duration and acoustic models, and the WORLD vocoder. For details on the TTS systems and the training procedure, the reader is referred to [23,24]. By training a TTS system for each speaker, we get 4 speaker-dependent TTS systems.…”

Section: Algorithmic Settings Evaluation and State-of-the-art Measuresmentioning

confidence: 99%

Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment

Janbakhshi

Kodrasi

Bourlard

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic pathological speech intelligibility measures are crucial to assist the clinical diagnosis and treatment of speech disorders. The recently proposed pathological short-time objective intelligibility (P-ESTOI) measure was shown to be very advantageous, yielding a high performance for several speech pathologies. However, to assess the intelligibility of an utterance from a patient, P-ESTOI relies on the availability of recordings of the same utterance by several healthy speakers such that an intelligible reference model can be created. Such recordings are not always easily available, limiting the practical applicability of P-ESTOI. To be able to use P-ESTOI in such scenarios, in this paper we propose to use synthetic speech generated by state-of-the-art high-quality text-to-speech systems to create an intelligible reference model. Experimental results on a database of Cerebral Palsy patients show that the performance of P-ESTOI using synthetic speech references is comparable to using natural speech references, making P-ESTOI a flexible measure which does not require healthy speech recordings and which outperforms state-of-the-art pathological speech intelligibility measures.

show abstract

Section: Algorithmic Settings Evaluation and State-of-the-art Measuresmentioning

confidence: 99%

Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment

Janbakhshi

Kodrasi

Bourlard

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…The E2E system is initialized with a pre-trained atom model, which uses the same topology and training as described in our previous work [3]. At first the E2E model is trained for 50 epochs (LR of 0.001), without the phrase bias, on LF0 from which the phrase contribution is removed.…”

Section: Network Topologies and Trainingmentioning

confidence: 99%

“…We studied how a Recurrent Neural Network (RNN) can generate the command signals of the GCR model to generate intonation by emulating a spiking neural network [3]. The RNN predicts the position and amplitude of the command spikes for a given text, which are filtered by the GCR muscle models to generate the pitch contour.…”

mentioning

confidence: 99%

An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model

Marelli

Schnell

Bourlard

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The generalized command response (GCR) model represents intonation as a superposition of muscle responses to spike command signals. We have previously shown that the spikes can be predicted by a two-stage system, consisting of a recurrent neural network and a post-processing procedure, but the responses themselves were fixed dictionary atoms. We propose an end-to-end neural architecture that replaces the dictionary atoms with trainable second-order recurrent elements analogous to recursive filters. We demonstrate gradient stability under modest conditions, and show that the system can be trained by imposing temporal sparsity constraints. Subjective listening tests demonstrate that the system can synthesize intonation with high naturalness, comparable to state-of-the-art acoustic models, and retains the physiological plausibility of the GCR model.

show abstract

Prediction of Voicing and the F0 Contour from Electromagnetic Articulography Data for Articulation-to-Speech Synthesis

Stone

Schmidt

Birkholz

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation

Cited by 3 publications

References 24 publications

Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment

Synthetic Speech References for Automatic Pathological Speech Intelligibility Assessment

An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model

Prediction of Voicing and the F0 Contour from Electromagnetic Articulography Data for Articulation-to-Speech Synthesis

Contact Info

Product

Resources

About