Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat

Hu, Dongyuan; Yang, Rennong; Zuo, Jialiang; Zhang, Ze; Wu, Jun; Wang, Ying

doi:10.1109/access.2021.3060426

Cited by 45 publications

(27 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A comprehensive mathematical model of air combat confrontation is established to effectively improves the fidelity of confrontation state. Through a large number of machine-to-machine confrontation simulation experiments with various initial states based on the MATLAB platform [22,23], an analysis of the maneuvering decision process is performed; the results verify that the model can autonomously output reasonable maneuvers for a certain goal and ensure highquality results for fighter missile attack decisions and kill efficiency calculations. The proposed dynamic method based on influence diagrams can be used to realistically assess a situation involving BVR air combat, and the superiority and killing benefit calculations are more accurate than those based on static methods; therefore, this method can accurately relate equipment contributions and the roles of equipment in a system.…”

Section: Introductionmentioning

confidence: 96%

Fighter Equipment Contribution Evaluation Based on Maneuver Decision

Chen

2021

IEEE Access

View full text Add to dashboard Cite

With the development of integrated technology and equipment systems, decision-making support has been increasingly applied in equipment research and development, especially in air combat. The development of airborne equipment can significantly improve the confrontation ability of manned fighters. Contribution evaluations restrict decision making in equipment research and development and are important in beyond-visual-range (BVR) air combat, which involves dynamic and uncertain enemy positions. This paper proposes a contribution evaluation model for BVR air combat based on autonomous maneuver decision making; this approach mainly includes a situation evaluation model, a maneuver decision model, and a one-to-one BVR air combat evaluation model. However, such a model includes a high-dimensional state and action space, which require many computations, especially if equipment changes occur. Then, an autonomous maneuver decision algorithm based on the influence diagram method is proposed; this algorithm uses a rolling time domain concept to improve combat fidelity and obtain useful equipment contribution evaluation results. Finally, one-to-one BVR air combat is simulated based on different aircraft models and airborne equipment. The simulation results show that the proposed maneuver decision model and countermeasure evaluation model can help experts quantify the contributions of equipment and provide adequate decision-making support to improve equipment systems.

show abstract

Section: Introductionmentioning

confidence: 96%

Fighter Equipment Contribution Evaluation Based on Maneuver Decision

Chen

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…However, when the space dimension increases, it faces the explosion problem of the state space and action space, and its decision accuracy is bound to be affected by fuzziness. Many scholars have adopted different DQN algorithms such as DQN [9,32], LSTM-DQN [33], and MS-DDQN [34] to realize the maneuver learning of UAV in short-range air combat and solved the decision problem of continuous state space. However, its action space basically adopts the form of maneuver action library, and the limited maneuver action library is difficult to reflect the maneuver action in actual air combat.…”

Section: Introductionmentioning

confidence: 99%

Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient

Xianyong

Hou

et al. 2022

IEEE Access

View full text Add to dashboard Cite

Autonomous maneuvering decisions of unmanned aerial vehicle (UAV) in short-range air combat remain a challenging research topic, and a decision method based on an improved deep deterministic policy gradient (DDPG) is proposed. First, the problem model is improved from the perspective of energy-air combat, and a decision model with engine thrust, angle of attack, and roll angle as control variables is established. The normal and tangential overloads are determined by these control variables, and the decision is constrained by the flight stability and threshold range. Subsequently, the decision learning algorithm of the maneuver command is designed based on the DDPG framework. According to the energy air combat, speed is introduced into the return function in some states to make the return value more in line with reality. In view of the slow learning speed of the DDPG algorithm, the winning rate is introduced into the ε-greedy strategy to adjust the exploration and application probabilities in real time. In view of the decrease in computational efficiency caused by the large amount of empirical data, a similar empirical exclusion was carried out based on the vector distance. The simulation results show that the DDPG-based algorithm realizes autonomous decisions of engine thrust, roll angle, and attack angle under constraints, and the comparative simulation shows that the improvement measures are effective.INDEX TERMS Unmanned aerial vehicle (UAV), maneuvering decision, deep deterministic policy gradient (DDPG), short-range air combat, reinforcement Learning (RL)

show abstract

“…However, the maneuver library of this method only contains five maneuvers, which cannot meet the needs of air combat. Hu et al ( 2021 ) proposed to use the improved deep Q network (Mnih et al, 2015 ) for maneuver decisions in autonomous air combat, constructed the relative motion model, missile attack model, maneuver decision-making framework, designed the reward function for training agents, and replaced the strategy network in deep Q network with the perception situation layer and value fitting layer. This method improves the winning rate of air combat, but the maneuver library is relatively simple and difficult to meet the needs of air combat.…”

Section: Introductionmentioning

confidence: 99%

Autonomous maneuver decision-making method based on reinforcement learning and Monte Carlo tree search

Zhang

Huan²,

Wei³

et al. 2022

Front. Neurorobot.

View full text Add to dashboard Cite

Autonomous maneuver decision-making methods for air combat often rely on human knowledge, such as advantage functions, objective functions, or dense rewards in reinforcement learning, which limits the decision-making ability of unmanned combat aerial vehicle to the scope of human experience and result in slow progress in maneuver decision-making. Therefore, a maneuver decision-making method based on deep reinforcement learning and Monte Carlo tree search is proposed to investigate whether it is feasible for maneuver decision-making without human knowledge or advantage function. To this end, Monte Carlo tree search in continuous action space is proposed and neural networks-guided Monte Carlo tree search with self-play is utilized to improve the ability of air combat agents. It starts from random behaviors and generates samples consisting of states, actions, and results of air combat through self-play without using human knowledge. These samples are used to train the neural network, and the neural network with a greater winning rate is selected by simulations. Then, repeat the above process to gradually improve the maneuver decision-making ability. Simulations are conducted to verify the effectiveness of the proposed method, and the kinematic model of the missile is used in simulations instead of the missile engagement zone to test whether the maneuver decision-making method is effective or not. The simulation results of the fixed initial state and random initial state show that the proposed method is efficient and can meet the real-time requirement.

show abstract

Application of Deep Reinforcement Learning in Maneuver Planning of Beyond-Visual-Range Air Combat

Cited by 45 publications

References 19 publications

Fighter Equipment Contribution Evaluation Based on Maneuver Decision

Fighter Equipment Contribution Evaluation Based on Maneuver Decision

Research on Maneuvering Decision Algorithm Based on Improved Deep Deterministic Policy Gradient

Autonomous maneuver decision-making method based on reinforcement learning and Monte Carlo tree search

Contact Info

Product

Resources

About