2014
DOI: 10.1007/s11042-014-2156-2
|View full text |Cite
|
Sign up to set email alerts
|

Head motion synthesis from speech using deep neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
53
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 47 publications
(53 citation statements)
references
References 35 publications
0
53
0
Order By: Relevance
“…It is similar to the work of [10] except that we did not use RBMs in pre-training. Acoustic and EMA features were concatenated from a context of five frames to the left and fives frames to the right of the current frame, resulting in a 572-dimensional input vector.…”
Section: Experimental Setupsmentioning
confidence: 97%
See 1 more Smart Citation
“…It is similar to the work of [10] except that we did not use RBMs in pre-training. Acoustic and EMA features were concatenated from a context of five frames to the left and fives frames to the right of the current frame, resulting in a 572-dimensional input vector.…”
Section: Experimental Setupsmentioning
confidence: 97%
“…Ding et al [10] were the first to use DNNs for speech-driven head motion synthesis. They pre-trained a deep belief network (DBN) with stacked restricted Boltzmann machines, then added a target layer on top of the DBN for parameter fine-tuning.…”
Section: Introductionmentioning
confidence: 99%
“…DNNs were proposed as a modelling strategy for head motion prediction by Ding et al [13]. Using a deep Feed-Forward Neural Network (FFN) regression model to predict Euler angles of nod, yaw and roll, they were able to report advantages over the previous HMM based approaches and were able to avoid the problem of clustering motion.…”
Section: Introductionmentioning
confidence: 99%
“…Another example by Sutskever et al [17] reports state of the art performance for the language translation task. Ding et al [18] introduced Bi-Directional Long Short Term Memory (BLSTM) networks to the head motion task, noting improvements over their own earlier work [13]. More recently Haag [19] uses BLSTMs and Bottleneck features [20] and noted a subtle improvement.…”
Section: Introductionmentioning
confidence: 99%
“…Related work has also converted acoustic speech features (e.g. filter bank, MFCC, LPC) into head motion parameters (nod, yaw, roll) using a feed-forward neural network model [18]. This paper continues with the DNN-based approach for predicting visual features from a text input but aims to improve the resulting naturalness of the animation.…”
Section: Introductionmentioning
confidence: 99%