Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1904
|View full text |Cite
|
Sign up to set email alerts
|

A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation

Abstract: The Generalised Command Response (GCR) model is a timelocal model of intonation that has been shown to lend itself to (cross-language) transfer of emphasis. In order to generalise the model to longer prosodic sequences, we show that it can be driven by a recurrent neural network emulating a spiking neural network. We show that a loss function for error backpropagation can be formulated analogously to that of the Spike Pattern Association Neuron (SPAN) method for spiking networks. The resulting system is able t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 24 publications
(28 reference statements)
0
3
0
Order By: Relevance
“…To this end, we use a DNN-based state-of-the-art Merlin TTS system in conjunction with the Festival front-end, two Bidirectional Long Short-Term Memory networks as duration and acoustic models, and the WORLD vocoder. For details on the TTS systems and the training procedure, the reader is referred to [23,24]. By training a TTS system for each speaker, we get 4 speaker-dependent TTS systems.…”
Section: Algorithmic Settings Evaluation and State-of-the-art Measuresmentioning
confidence: 99%
“…To this end, we use a DNN-based state-of-the-art Merlin TTS system in conjunction with the Festival front-end, two Bidirectional Long Short-Term Memory networks as duration and acoustic models, and the WORLD vocoder. For details on the TTS systems and the training procedure, the reader is referred to [23,24]. By training a TTS system for each speaker, we get 4 speaker-dependent TTS systems.…”
Section: Algorithmic Settings Evaluation and State-of-the-art Measuresmentioning
confidence: 99%
“…The E2E system is initialized with a pre-trained atom model, which uses the same topology and training as described in our previous work [3]. At first the E2E model is trained for 50 epochs (LR of 0.001), without the phrase bias, on LF0 from which the phrase contribution is removed.…”
Section: Network Topologies and Trainingmentioning
confidence: 99%
“…We studied how a Recurrent Neural Network (RNN) can generate the command signals of the GCR model to generate intonation by emulating a spiking neural network [3]. The RNN predicts the position and amplitude of the command spikes for a given text, which are filtered by the GCR muscle models to generate the pitch contour.…”
mentioning
confidence: 99%