2022
DOI: 10.20517/ir.2022.20
|View full text |Cite
|
Sign up to set email alerts
|

Deep reinforcement learning for real-world quadrupedal locomotion: a comprehensive review

Abstract: Building controllers for legged robots with agility and intelligence has been one of the typical challenges in the pursuit of artificial intelligence (AI). As an important part of the AI field, deep reinforcement learning (DRL) can realize sequential decision making without physical modeling through end-to-end learning and has achieved a series of major breakthroughs in quadrupedal locomotion research. In this review article, we systematically organize and summarize relevant important literature, covering DRL … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 74 publications
0
2
0
Order By: Relevance
“…However, such an approach would require re-tuning the hyperparameters from scratch each time we face a new task. It is well-known that tuning hyperparameters for quadruped robot control problems is an extremely complex and tedious process [10]. This would cause our method to lose its plug-and-play capability.…”
Section: Loss Of Plasticity and Re-initializationmentioning
confidence: 99%
“…However, such an approach would require re-tuning the hyperparameters from scratch each time we face a new task. It is well-known that tuning hyperparameters for quadruped robot control problems is an extremely complex and tedious process [10]. This would cause our method to lose its plug-and-play capability.…”
Section: Loss Of Plasticity and Re-initializationmentioning
confidence: 99%
“…[18] PG-based algorithms maximize the action policy' s expected return by updating the action policy directl [19] . The PPO algorithm' s main objective is [18] 𝐿 (πœƒ) = E [𝐿 π‘Žπ‘π‘‘π‘œπ‘Ÿ (πœƒ) βˆ’ 𝑐 1 𝐿 π‘π‘Ÿπ‘–π‘‘π‘–π‘ (πœƒ) + 𝑐 2 𝑆 πœƒ (π‘œ 𝑑 )] , (11) where 𝐿 π‘Žπ‘π‘‘π‘œπ‘Ÿ and 𝐿 π‘π‘Ÿπ‘–π‘‘π‘–π‘ are the loss function for actor network and critic network, πœƒ is the parameters of networks, 𝑐 1 and 𝑐 2 are coefficients, 𝑆 πœƒ is the entropy bonus which is used to ensure sufficient exploration, and π‘œ 𝑑 is the observation of the actor network [18] . By considering these terms, the loss calculated by Equation ( 11) is related to the parameters of actor and critic networks.…”
Section: The Ppo Algorithmmentioning
confidence: 99%
“…Many types of DRL algorithms have been proposed, like deep Q network (DQN), deep deterministic policy gradient (DDPG), proximal policy optimization (PPO), etc. DRL has been applied in UAV path control [10] , quadrupedal locomotion control [11] , autonomous platoon control [12] , etc. At the same time, DRL has been used to improve the operational efficiency of air combat [9,13,14] .…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, they designed a defogging algorithm and an edge-fitting term to improve the segmentation process, ultimately achieving better results. In 2012, with the ascent of convolutional networks, artificial intelligence [10] has experienced rapid growth and advancement, and it has gradually found its way into various aspects of human life. Long J.…”
Section: Related Researchmentioning
confidence: 99%