2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793720
|View full text |Cite
|
Sign up to set email alerts
|

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

Abstract: Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully prod… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
194
3
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 171 publications
(198 citation statements)
references
References 19 publications
0
194
3
1
Order By: Relevance
“…Ishi et al [22] generated gestures from text input through a series of probabilistic functions: Words were mapped to word concepts using WordNet [34], which then were mapped to a gesture function (e.g., iconic or beat), which in turn were mapped to clusters of 3D hand gestures. Yoon et al [48] learned a mapping from the utterance text to gestures using a recurrent neural network. The produced gestures were aligned with audio in a post-processing step.…”
Section: 22mentioning
confidence: 99%
See 3 more Smart Citations
“…Ishi et al [22] generated gestures from text input through a series of probabilistic functions: Words were mapped to word concepts using WordNet [34], which then were mapped to a gesture function (e.g., iconic or beat), which in turn were mapped to clusters of 3D hand gestures. Yoon et al [48] learned a mapping from the utterance text to gestures using a recurrent neural network. The produced gestures were aligned with audio in a post-processing step.…”
Section: 22mentioning
confidence: 99%
“…Instead, they rely on postprocessing to increase smoothness as in [19]. Yoon et al [48] include a velocity penalty in training that discourages jerky motion. The recurrent connections used in several models [13,19,48] can also act as a pose memory that may help the model to produce smooth output motion.…”
Section: Regarding Motion Continuitymentioning
confidence: 99%
See 2 more Smart Citations
“…Lastly, the learned model can be applied to a humanoid robot so that the robot's speech is accompanied by appropriate co-speech gestures, for instance on the NAO robot as in [39].…”
Section: Future Workmentioning
confidence: 99%