Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Polvara, Riccardo; Patacchiola, Massimiliano; Hanheide, Marc; Neumann, Gerhard

doi:10.3390/robotics9010008

Cited by 33 publications

(27 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In 2013, DeepMind innovatively combined deep learning (DL) with RL to form a new hotspot in the field of artificial intelligence which is known as DRL [20]. By leveraging the decision-making capabilities of RL and the perceived capabilities of DL, DRL has been proven to be efficient at controlling UAV [21][22][23][24][25][26][27][28][29][30][31]. Zhu [21] proposed a framework for target driven visual navigation, this framework addressed some of the limitations that prevent DRL algorithms from being applied to realistic settings.…”

Section: Related Workmentioning

confidence: 99%

“…Singla [28] designed a deep recurrent Q-Network [34] with temporal attention that exhibited significant improvements over DQN and D3QN [32] for UAV motion planning in a cluttered and unseen environment. For the autonomous landing task of UAV, Polvara R [29] introduced a sequential DQN which is comparable with DQN and human pilots while being quantitatively better in noisy conditions. Wang [30] proposed a fast recurrent deterministic policy gradient algorithm to address the UAV's autonomous navigation problem in a large-scale complex environment.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Wan

Gao

et al. 2020

Sensors

View full text Add to dashboard Cite

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP–DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP–DDPG can stably complete a variety of tasks in complex, unknown environments.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Wan

Gao

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…The target network retains a fixed value during the original Q Network learning for a few times and periodically resets it to the original Q Network value [35]. This can be an effective way of learning because Q Network can be closer to a fixed target network.…”

Section: Introduction To the Algorithms Of Deep Q Learningmentioning

confidence: 99%

A Modified Quad Q Network Algorithm for Predicting Resource Management

et al. 2021

View full text Add to dashboard Cite

As the resource management systems continues to grow, the resource distribution system is expected to expand steadily. The demand response system enables producers to reduce the consumption costs of an enterprise during fluctuating periods in order balance the supply grid and resell the remaining resources of the product to generate revenue. Q-learning, a reinforcement learning algorithm based on a resource distribution compensation mechanism, is used to make optimal decisions to schedule the operation of smart factory appliances. In this paper, we proposed an effective resource management system for enterprise demand response using a Quad Q Network algorithm. The proposed algorithm is based on a Deep Q Network algorithm that directly integrates supply-demand inputs into control logic and employs fuzzy inference as a reward mechanism. In addition to using uses the Compare Optimizer method to reduce the loss value of the proposed Q Network Algorithm, Quad Q Network also maintains a high accuracy with fewer epochs. The proposed algorithm was applied to market capitalization data obtained from Google and Apple. Also, we verified that the Compare Optimizer used in Quad Q Network derives the minimum loss value through the double operation of Double Q value.

show abstract

“…Responding to the needs, there have been various studies and algorithms developed for autonomous flight systems. Especially, many ML-based (Machine-learning based) methods have been proposed for autonomous path finding [ 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 ]. However, they are limited when applied to a large target area.…”

Section: Introductionmentioning

confidence: 99%

“…The authors proposed a MEP–DDPG algorithm to address UAV’s AMP (Autonomous Motion Planning) problem. The authors of [ 18 , 19 ] proposed an autonomous landing task mechanism of UAVs based on sequential DQN and DDPG algorithms, respectively.…”

Section: Introductionmentioning

confidence: 99%

Fisheye-Based Smart Control System for Autonomous UAV Operation

Han¹

2020

Sensors

View full text Add to dashboard Cite

Recently, as UAVs (unmanned aerial vehicles) have become smaller and higher-performance, they play a very important role in the Internet of Things (IoT). Especially, UAVs are currently used not only in military fields but also in various private sectors such as IT, agriculture, logistics, construction, etc. The range is further expected to increase. Drone-related techniques need to evolve along with this change. In particular, there is a need for the development of an autonomous system in which a drone can determine and accomplish its mission even in the absence of remote control from a GCS (Ground Control Station). Responding to such requirements, there have been various studies and algorithms developed for autonomous flight systems. Especially, many ML-based (Machine-Learning-based) methods have been proposed for autonomous path finding. Unlike other studies, the proposed mechanism could enable autonomous drone path finding over a large target area without size limitations, one of the challenges of ML-based autonomous flight or driving in the real world. Specifically, we devised Multi-Layer HVIN (Hierarchical VIN) methods that increase the area applicable to autonomous flight by overlaying multiple layers. To further improve this, we developed Fisheye HVIN, which applied an adaptive map compression ratio according to the drone’s location. We also built an autonomous flight training and verification platform. Through the proposed simulation platform, it is possible to train ML-based path planning algorithms in a realistic environment that takes into account the physical characteristics of UAV movements.

show abstract

Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization

Cited by 33 publications

References 31 publications

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

A Modified Quad Q Network Algorithm for Predicting Resource Management

Fisheye-Based Smart Control System for Autonomous UAV Operation

Contact Info

Product

Resources

About