Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Escontrela, Alejandro; Peng, Xue Bin; Yu, Wenhao; Zhang, Tingnan; İşçen, Atıl; Goldberg, Ken; Abbeel, Pieter

doi:10.1109/iros47612.2022.9981973

Cited by 41 publications

(24 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our method is based on imitation-RL method AMP Peng et al (2021), capable of learning complex naturalistic motion on humanoid skeletons and showing good transferrability of simulation-learned policy to real-world robots Escontrela et al (2022), Vollenweider et al (2022). AMP is a subsequent work of Generative Adversarial Imitation learning Ho and Ermon (2016), that takes one or multiple clips of reference motion and learns a motor control policy π θ that imitates the motion dynamics of the reference (s) through a discriminator network D ϕ .…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Learning to generate pointing gestures in situated embodied conversational agents

et al. 2023

View full text Add to dashboard Cite

One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Our method is based on imitation-RL method AMP Peng et al. (2021) , capable of learning complex naturalistic motion on humanoid skeletons and showing good transferrability of simulation-learned policy to real-world robots Escontrela et al. (2022) , Vollenweider et al.…”

Section: Methodsmentioning

confidence: 99%

Learning to generate pointing gestures in situated embodied conversational agents

et al. 2023

View full text Add to dashboard Cite

show abstract

“…For convenience, it will be referred to as 'complex rewards' in the following text. 2) Policy trained with method from Escontrela et al [5], using adversarial motion priors as style reward. For convenience, it will be referred to as 'amp' in the following text.…”

Section: A Quantitative Analysismentioning

confidence: 99%

“…[11] trained a neural network state estimator to estimate robot states that cannot be directly read from sensory data. [5] used AMP to train control policies for a quadrupedal robot and showed that AMP makes good substitutes for complex reward functions. [12] trained reinforcement learning controller using unsupervised skill discovery and transferred it to a real quadruped robot.…”

Section: Deep Reinforcement Learning For Legged Locomotionmentioning

confidence: 99%

“…Adversarial Motion Priors (AMP) [8] makes use of adversarial imitation learning to learn locomotion tasks that resemble the style of the real-world motion data and achieves impressive results. While existing work has shown successful transfer from simulation to a real quadrupedal robot using AMP [5], the control policy is only capable of traversing flat terrain instead of challenging ones such as stairs, slippery ground. A straight-forward approach to adapt AMP policies to challenging terrains is simply training the control policy in environments with different types of terrain in simulation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning Agile, Robust Locomotion Skills for Quadruped Robot

Zhang

Wei

Chang

et al. 2022

2022 International Conference on Advanced Robotics and Mechatronics (ICARM)

View full text Add to dashboard Cite

The successful transfer of a learned controller from simulation to the real world for a legged robot requires not only the ability to identify the system, but also accurate estimation of the robot's state. In this paper, we propose a novel algorithm that can infer not only information about the parameters of the dynamic system, but also estimate important information about the robot's state from previous observations. We integrate our algorithm with Adversarial Motion Priors and achieve a robust, agile, and natural gait in both simulation and on a Unitree A1 quadruped robot in the real world. Empirical results demonstrate that our proposed algorithm enables traversing challenging terrains with lower power consumption compared to the baselines. Both qualitative and quantitative results are presented in this paper. Videos at https://youtu.be/7Ggcj6IzfhM.

show abstract

A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot

Tao,

Li,

Cao

et al. 2023

Advanced Intelligent Systems

View full text Add to dashboard Cite

Due to the nonlinearity and underactuation of bipedal robots, developing efficient jumping strategies remains challenging. To address this, a multiobjective collaborative deep reinforcement learning algorithm based on the actor‐critic framework is presented. Initially, two deep deterministic policy gradient (DDPG) networks are established for training the jumping motion, each focusing on different objectives and collaboratively learning the optimal jumping policy. Following this, a recovery experience replay mechanism, predicated on dynamic time warping, is integrated into the DDPG to enhance sample utilization efficiency. Concurrently, a timely adjustment unit is incorporated, which works in tandem with the training frequency to improve the convergence accuracy of the algorithm. Additionally, a Markov decision process is designed to manage the complexity and parameter uncertainty in the dynamic model of the bipedal robot. Finally, the proposed method is validated on a PyBullet platform. The results show that the method outperforms baseline methods by improving learning speed and enabling robust jumps with greater height and distance.

show abstract

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Cited by 41 publications

References 39 publications

Learning to generate pointing gestures in situated embodied conversational agents

Learning to generate pointing gestures in situated embodied conversational agents

Learning Agile, Robust Locomotion Skills for Quadruped Robot

A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot

Contact Info

Product

Resources

About