During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoustically-linked beat gestures or semantically-linked gesticulation (e.g., raising a hand when saying "high"): they cannot appropriately learn to generate both gesture types. We present a model designed to produce arbitrary beat and semantic gestures together. Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output. The resulting gestures can be applied to both virtual agents and humanoid robots. Subjective and objective evaluations confirm the success of our approach. The code and video are available at the project page svito-zar.github.io/gesticulator.
Virtual Reality (VR) is gaining more and more popularity as a research tool in the field of Human-Robot Interaction (HRI). To fully deploy the potential of VR and benefit HRI studies, we need to establish the basic understanding of the relationship between the physical, real-world interaction (Live) and VR. This study compared Live and VR HRI with a focus on proxemics, as proxemics preference can reflect comprehensive human intuition, making it suitable to be used to compare Live and VR. To evaluate the influence of different modalities in VR, virtual scenes with different visual familiarity and spatial sound were compared as well. Lab experiments were conducted with a physical Pepper robot and its virtual copy. In both Live and VR, proxemics preferences, the perception of the robot (competence and discomfort) and the feeling of presence were measured and compared. Results suggest that proxemic preferences do not remain consistent in Live and in VR, which could be influenced by the perception of the robot. Therefore, when conducting HRI experiments in VR, the perceptions of the robot need be compared before the experiments. Results also indicate freedom within VR HRI as different VR settings are consistent with each other. ABSTRACTVirtual Reality (VR) is gaining more and more popularity as a research tool in the field of Human-Robot Interaction (HRI). To fully deploy the potential of VR and benefit HRI studies, we need to establish the basic understanding of the relationship between the physical, real-world interaction (Live) and VR. This study compared Live and VR HRI with a focus on proxemics, as proxemics preference can reflect comprehensive human intuition, making it suitable to be used to compare Live and VR. To evaluate the influence of different modalities in VR, virtual scenes with different visual familiarity and spatial sound were compared as well. Lab experiments were conducted with a physical Pepper robot and its virtual copy. In both Live and VR, proxemics preferences, the perception of the robot (competence and discomfort) and the feeling of presence were measured and compared. Results suggest that proxemic preferences do not remain consistent in Live and in VR, which could be influenced by the perception of the robot. Therefore, when conducting HRI experiments in VR, the perceptions of the robot need be compared before the experiments. Results also indicate freedom within VR HRI as different VR settings are consistent with each other.
Figure 1: A human user instructing a robot dual-arm to pick-and-place objects: a) the human utters an instruction, b) the robot attempts to grasp the object, c) the robot indicates incapability through sudden arm movement. Even though the robot does not have a head and cannot speak, it affords interactional phenomena through non-verbal behaviour. Experiment published at [26].
A longstanding barrier to deploying robots in the real world is the ongoing need to author robot behavior. Remote data collection–particularly crowdsourcing—is increasingly receiving interest. In this paper, we make the argument to scale robot programming to the crowd and present an initial investigation of the feasibility of this proposed method. Using an off-the-shelf visual programming interface, non-experts created simple robot programs for two typical robot tasks (navigation and pick-and-place). Each needed four subtasks with an increasing number of programming statements (if statement, while loop, variables) for successful completion of the programs. Initial findings of an online study (N = 279) indicate that non-experts, after minimal instruction, were able to create simple programs using an off-the-shelf visual programming interface. We discuss our findings and identify future avenues for this line of research.
Reinforcement learning has shown great potential for learning sequential decision-making tasks. Yet, it is difficult to anticipate all possible real-world scenarios during training, causing robots to inevitably fail in the long run. Many of these failures are due to variations in the robot's environment. Usually experts are called to correct the robot's behavior; however, some of these failures do not necessarily require an expert to solve them. In this work, we query non-experts online for help and explore 1) if/how non-experts can provide feedback to the robot after a failure and 2) how the robot can use this feedback to avoid such failures in the future by generating shields that restrict or correct its high-level actions. We demonstrate our approach on common daily scenarios of a simulated kitchen robot. The results indicate that non-experts can indeed understand and repair robot failures. Our generated shields accelerate learning and improve data-efficiency during retraining.
Driving styles play a major role in the acceptance and use of autonomous vehicles. Yet, existing motion planning techniques can often only incorporate simple driving styles that are modeled by the developers of the planner and not tailored to the passenger. We present a new approach to encode human driving styles through the use of signal temporal logic and its robustness metrics. Specifically, we use a penalty structure that can be used in many motion planning frameworks, and calibrate its parameters to model different automated driving styles. We combine this penalty structure with a set of signal temporal logic formula, based on the Responsibility-Sensitive Safety model, to generate trajectories that we expected to correlate with three different driving styles: aggressive, neutral, and defensive. An online study showed that people perceived different parameterizations of the motion planner as unique driving styles, and that most people tend to prefer a more defensive automated driving style, which correlated to their self-reported driving style.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.