Abstract:The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles (UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems… Show more
“…The strength of deep learning lies in perception, and reinforcement learning has great advantages in decision-making applications [7]. Deep reinforcement learning combines the perception ability of deep learning and the decision-making ability of reinforcement learning [8], and comprehensively uses deep reinforcement learning and deduction. A large amount of data provided by the system to understand the battlefield situation and conduct situational deduction based on this is the current mainstream research direction.…”
Section: Research On Key Technologies Of Auxiliary Decision-makingmentioning
The situation of the future naval battlefield will become more and more complex, and it will become a trend to develop various military auxiliary decision-making systems based on artificial intelligence and big data technology. This paper sorts out the key technologies of the auxiliary decision-making system based on deep reinforcement learning. On this basis, it proposes the construction method of the naval aviation sea-striking agent model, and completes the construction of the training framework with the combat deduction system as the environment. Finally, it summarizes and prospects some of future work.
“…The strength of deep learning lies in perception, and reinforcement learning has great advantages in decision-making applications [7]. Deep reinforcement learning combines the perception ability of deep learning and the decision-making ability of reinforcement learning [8], and comprehensively uses deep reinforcement learning and deduction. A large amount of data provided by the system to understand the battlefield situation and conduct situational deduction based on this is the current mainstream research direction.…”
Section: Research On Key Technologies Of Auxiliary Decision-makingmentioning
The situation of the future naval battlefield will become more and more complex, and it will become a trend to develop various military auxiliary decision-making systems based on artificial intelligence and big data technology. This paper sorts out the key technologies of the auxiliary decision-making system based on deep reinforcement learning. On this basis, it proposes the construction method of the naval aviation sea-striking agent model, and completes the construction of the training framework with the combat deduction system as the environment. Finally, it summarizes and prospects some of future work.
“…18 Li et al 19 also addressed the application of DRL to optimize online flight cruise control of UAV. Li et al 20 solved the guidance problem of cluster vehicles via DQN, and the literature ensured the target resource allocation with availability and effectiveness. Existing literature mainly focus on trajectory planning and guidance control, and rarely literature was involved in the penetration problem of vehicle under the attack–defense confrontation model.…”
Aiming at the coordination between maneuvering penetration and high-precision guidance under complex flight missions of high-velocity vehicle, the manuscript studies a three-dimensional high-dynamic intelligent maneuvering guidance strategy based on optimal control and deep reinforcement learning (DRL). A three-dimensional attack–defense model is established, and maneuver guidance mission is decomposed into longitudinal and lateral directions. In the longitudinal direction, maneuvering model with the instantaneous miss distance as the control variable is constructed, and the maximum value principle is employed to obtain the optimal maneuver duration and start timing. In the lateral direction, Markov decision process model of maneuver guidance is proposed by synthesizing the guidance error and miss distance of encounter point, and the reward function is designed by considering maneuver and guidance performance. The DRL method is used to learn and train the maneuver strategy, and the training process is improved as well. The simulation results show that the intelligent maneuvering guidance strategy can improve the penetration performance, reduce influence of maneuver flight on the guidance accuracy, and ensure the adaptability under changeable flight missions.
“…e application of RL also plays a great role in the path control of UAV and self-driving vehicles. Zeng et al, Yang et al, and Li et al have pointed out that the movement or task execution process of UAV is a continuous control problem in a changing environment, and RL and deep deterministic policy gradient in DRL can be used to better realize the control process of UAV [24][25][26]. Compared to the path control of UAV, driverless vehicle in recent years is more complicated and has a more complex environment.…”
Section: Reinforcement Learning In Path Planningmentioning
The transportation system of those countries has a huge traffic flow is bearing great pressure on transportation planning and management. Vehicle path planning is one of the effective ways to alleviate such pressure. Deep reinforcement learning (DRL), as a state-of-the-art solution method in vehicle path planning, can better balance the ability and complexity of the algorithm to reflect the real situation. However, DRL has its own disadvantages of higher search cost and earlier convergence to the local optimum, as vehicle path planning issues are usually in a complex environment, and their action set can be diverse. In this paper, a mixed policy gradient actor-critic (AC) model with random escape term and filter operation is proposed, in which the policy weight is both data driven and model driven. The empirical data-driven method is used to improve the poor asymptotic performance, and the model-driven method is used to ensure the convergence speed of the whole model. At the same time, in order to avoid the model converging local optimum, a random escape term has been added in the policy weight update process to overcome the problem that it is difficult to optimize the non-convex loss function, and the random escape term can help to explore the policy in more directions. In addition, filter optimization has been innovatively introduced in this paper, and the step size of each iteration of the model is selected through the filter optimization algorithm to achieve the better iterative effect. Numerical experiment results have shown that the model proposed in this paper can not only improve the accuracy of the solution without losing the accuracy but also speed up the convergence speed and improve the utilization of data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.