Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization

Pérez-Mayos, Laura; Farrús, Mireia; Segura, Jordi Adell

doi:10.1007/s10846-019-01100-3

Cited by 11 publications

(13 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech and gestures are synchronized by dividing the speech in smaller audio chunks, and then generating motions for each chunk, with the same duration. That same year, Pérez-Mayos et al [196] proposed a model that uses three different approaches for speech-gesture synchronization. The first approach starts by identifying keywords in the text connected to gestures in the database.…”

Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning

confidence: 99%

“…Another example is the work of Kucherenko et al [112], where the BERT encoding of the speech transcription and temporal information about how the sentence is uttered is combined with log-power mel-spectogram features extracted from the audio signal. Pérez-mayos et al [196] decided to use the prosody of the speech to select beat gestures, while text is used for the remaining categories.…”

Section: Multimodalitymentioning

confidence: 99%

“…Their model then predicts which one of the clusters obtained should be selected based on the speech, and a gesture is instantiated based on the position of said cluster's centroid. Authors like Pérez-Mayos et al [196] or Xiao et al [203] handcrafted their gestures using the tools provided by the robotic platforms used in their works (in both cases, a Nao robot). Pérez-Mayos's work synchronized these predefined expressions by associating their triggering to either keywords or pitch peak patterns in the utterance.…”

Section: Gesture Designmentioning

confidence: 99%

“…Ishi et al [192] decided to use conditional probabilities to represent the connections between speech content and gesture functions, and between functions and motions. Pérez-Mayos et al [196] combined a part-of-speech based approach, where gestures were associated with keywords in the speech, while beat gestures were generated based on the prosodic content (in particular, the pitch graph). Finally, the work presented by Xiao et al [203] focuses on finding the correlation between the dissimilarities between two sentences and the dissimilarities between their associated behaviours.…”

Section: Algorithms Usedmentioning

confidence: 99%

See 3 more Smart Citations

A Social Robot Assisting in Cognitive Stimulation Therapy

Salichs

Fernández-Rodicio

Castillo

et al. 2018

Advances in Practical Applications of Agents, Multi-Agent Systems, and Complexity: The PAAMS Collection

View full text Add to dashboard Cite

Society is experiencing a series of demographic changes that can result in an unbalance between the active working and non-working age populations. One of the solutions considered to mitigate this problem is the inclusion of robots in multiple sectors, including the service sector. But for this to be a viable solution, among other features, robots need to be able to interact with humans successfully. This thesis seeks to endow a social robot with the abilities required for a natural human-robot interactions. The main objective is to contribute to the body of knowledge on the area of Human-Robot Interaction with a new, platform-independent, modular approach that focuses on giving roboticists the tools required to develop applications that involve interactions with humans. In particular, this thesis focuses on three problems that need to be addressed: (i) modelling interactions between a robot and an user; (ii) endow the robot with the expressive capabilities required for a successful communication; and (iii) endow the robot with a lively appearance.The approach to dialogue modelling presented in this thesis proposes to model dialogues as a sequence of atomic interaction units, called Communicative Acts, or CAs. They can be parametrized in runtime to achieve different communicative goals, and are endowed with mechanisms oriented to solve some of the uncertainties related to interaction. Two dimensions have been used to identify the required CAs: initiative (the robot or the user), and intention (either retrieve information or to convey it). These basic CAs can be combined in a hierarchical manner to create more re-usable complex structures. This approach simplifies the creation of new interactions, by allowing developers to focus exclusively on designing the flow of the dialogue, without having to re-implement functionalities that are common to all dialogues (like error handling, for example).The expressiveness of the robot is based on the use of a library of predefined multimodal gestures, or expressions, modelled as state machines. The module managing the expressiveness receives requests for performing gestures, schedules their execution in order to avoid any possible conflict that might arise, loads them, and ensures that their execution goes without problems. The proposed approach is also able to generate expressions in runtime based on a list of unimodal actions (an utterance, the motion of a limb, etc...). One of the key features of the proposed expressiveness management approach is the integration of a series of modulation techniques that can be used to modify the robot's expressions in runtime. This would allow the robot to adapt them to the particularities of a given situation (which would also increase the variability of the robot expressiveness), and to display different internal states with the same expressions.

show abstract

Section: Comparison Of Co-speech Gesture Prediction/generation Methodsmentioning

confidence: 99%

Section: Multimodalitymentioning

confidence: 99%

Section: Gesture Designmentioning

confidence: 99%

Section: Algorithms Usedmentioning

confidence: 99%

See 2 more Smart Citations

A Social Robot Assisting in Cognitive Stimulation Therapy

Salichs

Fernández-Rodicio

Castillo

et al. 2018

Advances in Practical Applications of Agents, Multi-Agent Systems, and Complexity: The PAAMS Collection

View full text Add to dashboard Cite

show abstract

“…Their joints have different degrees of freedom (DOF), movable ranges are not the same, etc. Therefore, original motions must be modified to be feasible by the robot, i.e the captured movements must be correctly mapped by satisfying several constraints (see [22] for a good overview of every aspect of the motion imitation task).…”

Section: Mapping: Translating Human Motion To Robot Motionmentioning

confidence: 99%

Modeling and evaluating beat gestures for social robots

Zabala

Rodriguez

Martínez-Otzeta

et al. 2021

Multimed Tools Appl

View full text Add to dashboard Cite

Natural gestures are a desirable feature for a humanoid robot, as they are presumed to elicit a more comfortable interaction in people. With this aim in mind, we present in this paper a system to develop a natural talking gesture generation behavior. A Generative Adversarial Network (GAN) produces novel beat gestures from the data captured from recordings of human talking. The data is obtained without the need for any kind of wearable, as a motion capture system properly estimates the position of the limbs/joints involved in human expressive talking behavior. After testing in a Pepper robot, it is shown that the system is able to generate natural gestures during large talking periods without becoming repetitive. This approach is computationally more demanding than previous work, therefore a comparison is made in order to evaluate the improvements. This comparison is made by calculating some common measures about the end effectors’ trajectories (jerk and path lengths) and complemented by the Fréchet Gesture Distance (FGD) that aims to measure the fidelity of the generated gestures with respect to the provided ones. Results show that the described system is able to learn natural gestures just by observation and improves the one developed with a simpler motion capture system. The quantitative results are sustained by questionnaire based human evaluation.

show abstract