Robotics: Science and Systems XVII 2021
DOI: 10.15607/rss.2021.xvii.062
|View full text |Cite
|
Sign up to set email alerts
|

HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Abstract: In this work we present a planning and control method for a quadrotor in an autonomous drone race. Our method combines the advantages of both model-based optimal control and model-free deep reinforcement learning. We consider a single drone racing on a track marked by a series of gates, through which it must maneuver in minimum time. Firstly we solve the discretized Hamilton-Jacobi-Bellman (HJB) equation to produce a closed-loop policy for a simplified, reduced order model of the drone. Next, we train a deep n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…Fig. 3: Trajectories generated by NeuralOC and our method for controlling a quadrotor to reach a desired state at [3,3,3].…”
Section: B Optimal Control Using Learned Dynamicsmentioning
confidence: 99%
See 2 more Smart Citations
“…Fig. 3: Trajectories generated by NeuralOC and our method for controlling a quadrotor to reach a desired state at [3,3,3].…”
Section: B Optimal Control Using Learned Dynamicsmentioning
confidence: 99%
“…We evaluate these methods on the task of controlling a quadrotor to reach a goal pose. The goal pose is set to the position [3,3,3] in an upright orientation. The initial state positions are sampled from a normal distribution around the origin N ([0, 0, 0], I), and the rest of the state variables are initialized as 0, corresponding to an upright orientation and no initial velocities.…”
Section: B Optimal Control Using Learned Dynamicsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed spatial ILC is compared with [6] in the same racing environment to verify the proposed approach's modelfree and online fast iterative features. The racing environment, namely Soccer Field, is from [4] shown as Fig.…”
Section: B Comparison In Race Competitionmentioning
confidence: 99%
“…Given the conditions of the known environment, pushing drones to their physical limits presents challenges to researchers. There are also many existing solutions to autonomous competitions, including the use of continuous-time polynomial trajectory planning [5], the time-discrete trajectories method with reinforcement learning (RL) methods [6], [7], search and sampling-based methods [8], and modelbased optimization methods [9]. Continuous-time polynomial trajectory planning has high computational efficiency, but Shuli Lv, Yan Gao, Jiaxing Che, Quan Quan (Corresponding Author) are with School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, P.R.…”
Section: Introductionmentioning
confidence: 99%