This paper proposes a cooperative search algorithm to enable swarms of unmanned aerial vehicles (UAVs) to capture moving targets. It is based on prior information and target probability constrained by inter-UAV distance for safety and communication. First, a rasterized environmental cognitive map is created to characterize the task area. Second, based on Bayesian theory, the posterior probability of a target’s existence is updated using UAV detection information. Third, the predicted probability distribution of the dynamic time-sensitive target is obtained by calculating the target transition probability. Fourth, a customized information interaction mechanism switches the interaction strategy and content according to the communication distance to produce cooperative decision-making in the UAV swarm. Finally, rolling-time domain optimization generates interactive information, so interactive behavior and autonomous decision-making among the swarm members are realized. Simulation results showed that the proposed algorithm can effectively complete a cooperative moving-target search when constrained by communication distance yet still cooperate effectively in unexpected situations such as a fire.
In the area of artificial intelligence, deep reinforcement learning has grown in significance. It has accomplished extraordinary feats and offers a fresh approach to previously challenging challenges, such as controlling a robotic arm and discovering game strategies. The two primary categories of deep reinforcement learning methods—deep reinforcement learning based on value function and deep reinforcement learning based on policy gradient—are initially explained in this study. The limitations of current approaches and the difficulties faced by deep reinforcement learning methods in related domains are further sorted out, and then the future application directions of deep reinforcement learning methods in the military sphere are examined. Finally, a growing trend for deep reinforcement learning techniques is anticipated in military applications.
Invalid action masking is a practical technique in deep reinforcement learning to prevent agents from taking invalid actions. Existing approaches rely on action masking during policy training and utilization. This study focuses on developing reinforcement learning algorithms that incorporate action masking during training but can be used without action masking during policy execution. The study begins by conducting a theoretical analysis to elucidate the distinction between naive policy gradient and invalid action policy gradient. Based on this analysis, we demonstrate that the naive policy gradient is a valid gradient and is equivalent to the proposed composite objective algorithm, which optimizes both the masked policy and the original policy in parallel. Moreover, we propose an off-policy algorithm for invalid action masking that employs the masked policy for sampling while optimizing the original policy. To compare the effectiveness of these algorithms, experiments are conducted using a simplified real-time strategy (RTS) game simulator called Gym-μRTS. Based on empirical findings, we recommend utilizing the off-policy algorithm for addressing most tasks while employing the composite objective algorithm for handling more complex tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.