Design and Verification of UAV Maneuver Decision Simulation System Based on Deep Q-learning Network

Chen, Yuyang; Zhang, Jiandong; Yang, Qiliang; Zhou, Yu; Shi, Guangming; Wu, Yong

doi:10.1109/icarcv50220.2020.9305467

Cited by 12 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, UAV pilots that utilize reinforcement learning (RL) are more flexible than rule-based pilots. With RL, UAV pilots have advanced to a level in which they can replace humans in decision-making [5,6]. In this situation, the air combat revolution was created B Jung Ho Bae deawith@gmail.com 1 Agency for Defense Development, Daejeon 34186, Korea by the defense advanced research project agency in 2019 to develop AI pilots to replace human pilots [7].…”

Section: Introductionmentioning

confidence: 99%

UAV Head-On Situation Maneuver Generation Using Transfer-Learning-Based Deep Reinforcement Learning

Hwang,

Bae

2024

Int. J. Aeronaut. Space Sci.

View full text Add to dashboard Cite

Recently, the demand for unmanned aerial vehicle technology has increased. In particular, AI pilots through reinforcement learning (RL) are more flexible than those using rule-based methods. Further, AI pilots with RL are expected to replace human pilots in the future. In a recent study, rather than completely replacing humans, studies on AI pilots are conducted toward the collaboration between man and unmanned aircraft. AI pilots have several advantages over humans. For example, on the one hand, human pilots avoid head-on situations due to collision. On the other hand, AI pilots may prefer head-on situations to finish the episode quickly. This study proposes a two-circle-based transfer learning method to demonstrate excellent performance in head-on situations. Based on the experimental results, the proposed two-circle-based multi-task transfer learning model outperforms the model without transfer learning-based RL. A study on transfer-learning-based learning technique has been conducted. However, it had a one-circle-based learning technique was specialized only for tail-chasing, limiting its application (Bae et al. in IEEE Access 11:26427–26440, 2023). Practically, the proposed two-circle-based learning technique outperforms the one-circle-based transfer learning technique in head-on situations.

show abstract

Section: Introductionmentioning

confidence: 99%

UAV Head-On Situation Maneuver Generation Using Transfer-Learning-Based Deep Reinforcement Learning

Hwang,

Bae

2024

Int. J. Aeronaut. Space Sci.

View full text Add to dashboard Cite

show abstract

“…The results showed that the RL agent was able to outperform the baseline agent in terms of survival rate. More recently, Hu et al [18] trained long and short-term memory (LSTM) in a deep Q-network (DQN) framework for air combat maneuvering decisions, and this was more forward-looking and efficient in its decision-making than fully connected neural-network-and statistical-principle-based algorithms [19]. In addition, Li proposed a deep reinforcement learning method based on proximal policy optimization (PPO) to learn combat strategies from observation in an end-to-end manner [20,21], and the adversarial results showed that his PPO agent can beat the adversary with a win rate of approximately 97%.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Reinforcement Learning Framework in Geographic Coordination for Air Combat Tactical Pursuit

Chen,

Li,

Yan

et al. 2023

Entropy

View full text Add to dashboard Cite

This paper proposes an air combat training framework based on hierarchical reinforcement learning to address the problem of non-convergence in training due to the curse of dimensionality caused by the large state space during air combat tactical pursuit. Using hierarchical reinforcement learning, three-dimensional problems can be transformed into two-dimensional problems, improving training performance compared to other baselines. To further improve the overall learning performance, a meta-learning-based algorithm is established, and the corresponding reward function is designed to further improve the performance of the agent in the air combat tactical chase scenario. The results show that the proposed framework can achieve better performance than the baseline approach.

show abstract

“…Because rule-based techniques enable aircraft to perform predetermined maneuvers under given conditions, it is difficult to respond appropriately to unexpected situations [24]. Hence, recent studies have utilized RL techniques that are particularly suited to learning to make decisions quickly in unpredictable or uncertain situations [25][26][27][28][29][30][31]. Masadeh et al [6] utilized multi-agent deep deterministic policy gradient and Bayesian optimization to optimize the trajectory and network formation of UAVs for rapid data transmission and minimize energy consumption and transmission delay in a situation where multiple UAVs are used as repeaters in a wireless network.…”

Section: Introductionmentioning

confidence: 99%

Stepwise Soft Actor–Critic for UAV Autonomous Flight Control

Hwang,

Jang,

Choi

et al. 2023

Drones

View full text Add to dashboard Cite

Despite the growing demand for unmanned aerial vehicles (UAVs), the use of conventional UAVs is limited, as most of them require being remotely operated by a person who is not within the vehicle’s field of view. Recently, many studies have introduced reinforcement learning (RL) to address hurdles for the autonomous flight of UAVs. However, most previous studies have assumed overly simplified environments, and thus, they cannot be applied to real-world UAV operation scenarios. To address the limitations of previous studies, we propose a stepwise soft actor–critic (SeSAC) algorithm for efficient learning in a continuous state and action space environment. SeSAC aims to overcome the inefficiency of learning caused by attempting challenging tasks from the beginning. Instead, it starts with easier missions and gradually increases the difficulty level during training, ultimately achieving the final goal. We also control a learning hyperparameter of the soft actor–critic algorithm and implement a positive buffer mechanism during training to enhance learning effectiveness. Our proposed algorithm was verified in a six-degree-of-freedom (DOF) flight environment with high-dimensional state and action spaces. The experimental results demonstrate that the proposed algorithm successfully completed missions in two challenging scenarios, one for disaster management and another for counter-terrorism missions, while surpassing the performance of other baseline approaches.

show abstract

Design and Verification of UAV Maneuver Decision Simulation System Based on Deep Q-learning Network

Cited by 12 publications

References 15 publications

UAV Head-On Situation Maneuver Generation Using Transfer-Learning-Based Deep Reinforcement Learning

UAV Head-On Situation Maneuver Generation Using Transfer-Learning-Based Deep Reinforcement Learning

Hierarchical Reinforcement Learning Framework in Geographic Coordination for Air Combat Tactical Pursuit

Stepwise Soft Actor–Critic for UAV Autonomous Flight Control

Contact Info

Product

Resources

About