A Reinforcement Learning Method Based on an Improved Sampling Mechanism for Unmanned Aerial Vehicle Penetration

Wang, Yue; Li, Kexv; Zhuang, Xing; Liu, Xinyu; Li, Hanyu

doi:10.3390/aerospace10070642

Cited by 3 publications

(2 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [20], Cao et al studied autonomous maneuver decisions for UCAV air combat based on the double deep Q network algorithm (DDQN) and stochastic game theory, which further boosted the performance of the UCAV in different combat cases. To compensate for the low training efficiency caused by simple sampling mechanisms, Wang et al proposed a task completion division soft actor-critic (TCD-SAC) algorithm for UAV penetration [21]. However, these studies did not take into account the uncertainty of environmental information obtained by agents in the real world, which leads to the degradation of DRL algorithm performance.…”

Section: Related Workmentioning

confidence: 99%

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

Li,

Zhang,

Liu

2023

Remote Sensing

View full text Add to dashboard Cite

Unmanned combat aerial vehicle (UCAV) trajectory planning to avoid radar detection threats is a complicated optimization problem that has been widely studied. The rapid changes in Radar Cross Sections (RCSs), the unknown cruise trajectory of airborne radar, and the uncertain distribution of radars exacerbate the complexity of this problem. In this paper, we propose a novel UCAV trajectory planning method based on deep reinforcement learning (DRL) technology to overcome the adverse impacts caused by the dynamics and randomness of environments. A predictive control model is constructed to describe the dynamic characteristics of the UCAV trajectory planning problem in detail. To improve the UCAV’s predictive ability, we propose a memory-enhanced twin delayed deep deterministic policy gradient (ME-TD3) algorithm that uses an attention mechanism to effectively extract environmental patterns from historical information. The simulation results show that the proposed method can successfully train UCAVs to carry out trajectory planning tasks in dynamic and unknown environments. Furthermore, the ME-TD3 algorithm outperforms other classical DRL algorithms in UCAV trajectory planning, exhibiting superior performance and adaptability.

show abstract

Section: Related Workmentioning

confidence: 99%

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

Li,

Zhang,

Liu

2023

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…In scenarios where only radar detection is considered, and the reward is the sparsest, Ma Zijie et al [23] proposed an improved deep reinforcement learning algorithm to enhance cruise missiles' penetration trajectory planning capability when facing dynamic early warning radar threats. Wang Y et al [24] combined Task Completion Division (TCD) with the Soft Actor-Critic (SAC) algorithm to form the TCD-SAC algorithm, proposing a reinforcement learning method based on an improved sampling mechanism to enhance the penetration capability of unmanned aerial vehicles in air defense systems, with the improved sampling mechanism effectively mitigating the training difficulties caused by sparse rewards.…”

Section: Introductionmentioning

confidence: 99%

Stealth Aircraft Penetration Trajectory Planning in 3D Complex Dynamic Based on Radar Valley Radius and Turning Maneuver

Lu,

Huang,

Guan

et al. 2024

Aerospace

View full text Add to dashboard Cite

Based on the quasi-six-degree-of-freedom flight dynamic equations, considering the changes in the elevation angle caused by an increase in the rolling angle during maneuvering turns, which leads to a rise in the radar cross-section. A computational model for the radar detection probability of aircraft in complex environments was constructed. By comprehensively considering flight parameters such as turning angle, rolling angle, Mach number, and radar power factor, this study quantitatively analyzed the influence of these factors on the radar detection probability. It revealed the variation patterns of radar detection probability under different flight conditions. The results provide theoretical support for the Radar Valley Radius and Turning Maneuver Method (RVR-TM) based on decision trees, and lay the foundation for the development of subsequent intelligent decision-making models. To further optimize the trajectory selection of aircraft in complex environments, this study combines theoretical analysis with reinforcement learning algorithms to establish an intelligent decision-making model. This model is trained using the Proximal Policy Optimization (PPO) algorithm, and through precisely defining the state space and reward functions, it accomplishes intelligent trajectory planning for stealth aircraft under radar threat scenarios.

show abstract

A Deep Reinforcement Learning-Based Intelligent Maneuvering Strategy for the High-Speed UAV Pursuit-Evasion Game

Yan,

Liu,

Gao

et al. 2024

Drones

View full text Add to dashboard Cite

Given the rapid advancements in kinetic pursuit technology, this paper introduces an innovative maneuvering strategy, denoted as LSRC-TD3, which integrates line-of-sight (LOS) angle rate correction with deep reinforcement learning (DRL) for high-speed unmanned aerial vehicle (UAV) pursuit–evasion (PE) game scenarios, with the aim of effectively evading high-speed and high-dynamic pursuers. In the challenging situations of the game, where both speed and maximum available overload are at a disadvantage, the playing field of UAVs is severely compressed, and the difficulty of evasion is significantly increased, placing higher demands on the strategy and timing of maneuvering to change orbit. While considering evasion, trajectory constraint, and energy consumption, we formulated the reward function by combining “terminal” and “process” rewards, as well as “strong” and “weak” incentive guidance to reduce pre-exploration difficulty and accelerate convergence of the game network. Additionally, this paper presents a correction factor for LOS angle rate into the double-delay deterministic gradient strategy (TD3), thereby enhancing the sensitivity of high-speed UAVs to changes in LOS rate, as well as the accuracy of evasion timing, which improves the effectiveness and adaptive capability of the intelligent maneuvering strategy. The Monte Carlo simulation results demonstrate that the proposed method achieves a high level of evasion performance—integrating energy optimization with the requisite miss distance for high-speed UAVs—and accomplishes efficient evasion under highly challenging PE game scenarios.

show abstract

A Reinforcement Learning Method Based on an Improved Sampling Mechanism for Unmanned Aerial Vehicle Penetration

Cited by 3 publications

References 36 publications

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-Based Unmanned Combat Aerial Vehicle Trajectory Planning for Avoiding Radar Detection Threats in Dynamic and Unknown Environments

Stealth Aircraft Penetration Trajectory Planning in 3D Complex Dynamic Based on Radar Valley Radius and Turning Maneuver

A Deep Reinforcement Learning-Based Intelligent Maneuvering Strategy for the High-Speed UAV Pursuit-Evasion Game

Contact Info

Product

Resources

About