9th ISCA Workshop on Speech Synthesis Workshop (SSW 9) 2016
DOI: 10.21437/ssw.2016-28
|View full text |Cite
|
Sign up to set email alerts
|

Contextual Representation using Recurrent Neural Network Hidden State for Statistical Parametric Speech Synthesis

Abstract: In this paper, we propose to use hidden state vector obtained from recurrent neural network (RNN) as a context vector representation for deep neural network (DNN) based statistical parametric speech synthesis. While in a typical DNN based system, there is a hierarchy of text features from phone level to utterance level, they are usually in 1-hot-k encoded representation. Our hypothesis is that, supplementing the conventional text features with a continuous frame-level acoustically guided representation would i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 17 publications
(18 reference statements)
0
1
0
Order By: Relevance
“…During training, deep variational autoencoder architecture (with Sigmoid, Hyperbolic tangent, Linear and Relu activation function) ignore the sequential nature of laughter. So, the better choice to include this special feature for audio signals is the use of the Recurrent Neural Network (RNN) [36]. RNN are known by their capacities in memorizing information learnt from prior inputs when generating outputs.…”
Section: Lstm-vaementioning
confidence: 99%
“…During training, deep variational autoencoder architecture (with Sigmoid, Hyperbolic tangent, Linear and Relu activation function) ignore the sequential nature of laughter. So, the better choice to include this special feature for audio signals is the use of the Recurrent Neural Network (RNN) [36]. RNN are known by their capacities in memorizing information learnt from prior inputs when generating outputs.…”
Section: Lstm-vaementioning
confidence: 99%