HJB-RL: Initializing Reinforcement Learning with Optimal Control Policies Applied to Autonomous Drone Racing

Nagami, Keiko; Schwager, Mac

doi:10.15607/rss.2021.xvii.062

Cited by 10 publications

(10 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fig. 3: Trajectories generated by NeuralOC and our method for controlling a quadrotor to reach a desired state at [3,3,3].…”

Section: B Optimal Control Using Learned Dynamicsmentioning

confidence: 99%

“…We evaluate these methods on the task of controlling a quadrotor to reach a goal pose. The goal pose is set to the position [3,3,3] in an upright orientation. The initial state positions are sampled from a normal distribution around the origin N ([0, 0, 0], I), and the rest of the state variables are initialized as 0, corresponding to an upright orientation and no initial velocities.…”

Section: B Optimal Control Using Learned Dynamicsmentioning

confidence: 99%

“…Generating control signals in a global fashion is possible using the Hamilton-Jacobi-Bellman (HJB) equations, whose solution is the value function or the optimal cost-to-go. However, solving these equations is notoriously difficult and the state-of-the-art methods are grid-based, limiting their application in higher dimensional problems [2], [3]. Figure 1 illustrates a high-level comparison of these approaches.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Neural Optimal Control using Learned System Dynamics

Engin¹,

Isler²

2023

Preprint

View full text Add to dashboard Cite

We study the problem of generating control laws for systems with unknown dynamics. Our approach is to represent the controller and the value function with neural networks, and to train them using loss functions adapted from the Hamilton-Jacobi-Bellman (HJB) equations. In the absence of a known dynamics model, our method first learns the state transitions from data collected by interacting with the system in an offline process. The learned transition function is then integrated to the HJB equations and used to forward simulate the control signals produced by our controller in a feedback loop.In contrast to trajectory optimization methods that optimize the controller for a single initial state, our controller can generate near-optimal control signals for initial states from a large portion of the state space. Compared to recent modelbased reinforcement learning algorithms, we show that our method is more sample efficient and trains faster by an order of magnitude. We demonstrate our method in a number of tasks, including the control of a quadrotor with 12 state variables.

show abstract

“…Fig. 3: Trajectories generated by NeuralOC and our method for controlling a quadrotor to reach a desired state at [3,3,3].…”

Section: B Optimal Control Using Learned Dynamicsmentioning

confidence: 99%

Section: B Optimal Control Using Learned Dynamicsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Optimal Control using Learned System Dynamics

Engin¹,

Isler²

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The proposed spatial ILC is compared with [6] in the same racing environment to verify the proposed approach's modelfree and online fast iterative features. The racing environment, namely Soccer Field, is from [4] shown as Fig.…”

Section: B Comparison In Race Competitionmentioning

confidence: 99%

“…Given the conditions of the known environment, pushing drones to their physical limits presents challenges to researchers. There are also many existing solutions to autonomous competitions, including the use of continuous-time polynomial trajectory planning [5], the time-discrete trajectories method with reinforcement learning (RL) methods [6], [7], search and sampling-based methods [8], and modelbased optimization methods [9]. Continuous-time polynomial trajectory planning has high computational efficiency, but Shuli Lv, Yan Gao, Jiaxing Che, Quan Quan (Corresponding Author) are with School of Automation Science and Electrical Engineering, Beihang University, Beijing, 100191, P.R.…”

Section: Introductionmentioning

confidence: 99%

Autonomous Drone Racing: Time-Optimal Spatial Iterative Learning Control within a Virtual Tube

Gao

Che

et al. 2023

2023 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

It is often necessary for drones to complete delivery, photography, and rescue in the shortest time to increase efficiency. Many autonomous drone races provide platforms to pursue algorithms to finish races as quickly as possible for the above purpose. Unfortunately, existing methods often fail to keep training and racing time short in drone racing competitions. This motivates us to develop a high-efficient learning method by imitating the training experience of top racing drivers. Unlike traditional iterative learning control methods for accurate tracking, the proposed approach iteratively learns a trajectory online to finish the race as quickly as possible. Simulations and experiments using different models show that the proposed approach is model-free and is able to achieve the optimal result with low computation requirements. Furthermore, this approach surpasses some state-of-the-art methods in racing time on a benchmark drone racing platform. An experiment on a real quadcopter is also performed to demonstrate its effectiveness.

show abstract