2019
DOI: 10.1007/s10846-019-01100-3
|View full text |Cite
|
Sign up to set email alerts
|

Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization

Abstract: Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-verbal and para-verbal cues. However, available robots are not able to communicate in this multimodal way, being just able to perform predefined gesture sequences, handcrafted to accompany specific utterances. In the current paper, we propose a model based on three diff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 27 publications
0
12
0
Order By: Relevance
“…Speech and gestures are synchronized by dividing the speech in smaller audio chunks, and then generating motions for each chunk, with the same duration. That same year, Pérez-Mayos et al [196] proposed a model that uses three different approaches for speech-gesture synchronization. The first approach starts by identifying keywords in the text connected to gestures in the database.…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…Speech and gestures are synchronized by dividing the speech in smaller audio chunks, and then generating motions for each chunk, with the same duration. That same year, Pérez-Mayos et al [196] proposed a model that uses three different approaches for speech-gesture synchronization. The first approach starts by identifying keywords in the text connected to gestures in the database.…”
Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning
confidence: 99%
“…Another example is the work of Kucherenko et al [112], where the BERT encoding of the speech transcription and temporal information about how the sentence is uttered is combined with log-power mel-spectogram features extracted from the audio signal. Pérez-mayos et al [196] decided to use the prosody of the speech to select beat gestures, while text is used for the remaining categories.…”
Section: Multimodalitymentioning
confidence: 99%
See 2 more Smart Citations
“…Their joints have different degrees of freedom (DOF), movable ranges are not the same, etc. Therefore, original motions must be modified to be feasible by the robot, i.e the captured movements must be correctly mapped by satisfying several constraints (see [22] for a good overview of every aspect of the motion imitation task).…”
Section: Mapping: Translating Human Motion To Robot Motionmentioning
confidence: 99%