Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning

Souza, Cristino de; Newbury, Rhys; Cosgun, Akansel; Castillo, Pedro; Vidolov, Boris; Kuli, Dana

doi:10.1109/lra.2021.3068952

Cited by 75 publications

(37 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Theorem 1 (PE game): For differential game (1)(2)(3) (11), the optimal strategies are given by (20) and the corresponding value function is V (x) = t E (x), where t E (x) is given by (19).…”

Section: Pe Gamementioning

confidence: 99%

“…Via modeling in differential games and calculating numerically, reach-avoid method is developed for TAD-like problems [17]- [19], where the player has to come to a region while avoiding another region. Besides, reinforcement learning (RL) is also used to solve these games or HJI equation in many works [10], [20]. However, these numerical methods have obvious weakness in computation time and solution accuracy.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Pursuit-evasion differential games of players with different speeds in spaces of different dimensions

Li¹,

Wang²,

Xie³

2022

Preprint

View full text Add to dashboard Cite

We study pursuit-evasion differential games between a faster pursuer moving in 3D space and an evader moving in a plane. We first extend the well-known Apollonius circle to 3D space, by which we construct the isochron for the considered two players. Then both cases with and without a static target are considered and the corresponding optimal strategies are derived using the concept of isochron. In order to guarantee the optimality of the proposed strategies, the value functions are given and are further proved to be the solution of Hamilton-Jacobi-Isaacs equation. Simulations with comparison between the proposed strategies and other classical strategies are carried out and the results show the optimality of the proposed strategies.

show abstract

Section: Pe Gamementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Pursuit-evasion differential games of players with different speeds in spaces of different dimensions

Li¹,

Wang²,

Xie³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…With respect to cooperative MARL research for the MVP game, the multi-agent system is modeled using Markov decision processes (MDP) [9], and a neural network can be used to approximate the complex objective function [10]. Cristino et al used the Twin Delayed Deep Deterministic Policy Gradient (TD3) to demonstrate a real-world pursuit-evasion in open environment with boundaries [11]. Timothy used the Deep Deterministic Policy Gradient (DDPG) with omnidirectional agents [12].…”

Section: Introductionmentioning

confidence: 99%

T3OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area

Zheng¹,

Wu²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Smart Internet of Vehicles (IoVs) combined with Artificial Intelligence (AI) will contribute to vehicle decision-making in the Intelligent Transportation System (ITS). Multi-Vehicle Pursuit games (MVP), a multi-vehicle cooperative ability to capture mobile targets, is becoming a hot research topic gradually. Although there are some achievements in the field of MVP in the open space environment, the urban area brings complicated road structures and restricted moving spaces as challenges to the resolution of MVP games. We define an Observation-constrained MVP (OMVP) problem in this paper and propose a Transformer-based Time and Team Reinforcement Learning scheme (T3OMVP) to address the problem. First, a new multi-vehicle pursuit model is constructed based on decentralized partially observed Markov decision processes (Dec-POMDP) to instantiate this problem. Second, by introducing and modifying the transformer-based observation sequence, QMIX is redefined to adapt to the complicated road structure, restricted moving spaces and constrained observations, so as to control vehicles to pursue the target combining the vehicle’s observations. Third, a multi-intersection urban environment is built to verify the proposed scheme. Extensive experimental results demonstrate that the proposed T3OMVP scheme achieves significant improvements relative to state-of-the-art QMIX approaches by 9.66%~106.25%. Code is available at https://github.com/pipihaiziguai/T3OMVP.

show abstract

“…Recently, deep reinforcement learning (DRL)-based navigation techniques have made rapid progress. The DRL has been proven to be applied to various mobile robotics fields, such as collision avoidance, object transportation, multi-robot navigation, and social navigation [9][10][11]. Among them, DRL-based object transportation techniques have attracted attention from many researchers because DRL can solve tricky issues of conventional methods [12][13][14].…”

Section: Introductionmentioning

confidence: 99%

Cooperative Object Transportation Using Curriculum-Based Deep Reinforcement Learning

Eoh

Park

2021

Sensors

View full text Add to dashboard Cite

This paper presents a cooperative object transportation technique using deep reinforcement learning (DRL) based on curricula. Previous studies on object transportation highly depended on complex and intractable controls, such as grasping, pushing, and caging. Recently, DRL-based object transportation techniques have been proposed, which showed improved performance without precise controller design. However, DRL-based techniques not only take a long time to learn their policies but also sometimes fail to learn. It is difficult to learn the policy of DRL by random actions only. Therefore, we propose two curricula for the efficient learning of object transportation: region-growing and single- to multi-robot. During the learning process, the region-growing curriculum gradually extended to a region in which an object was initialized. This step-by-step learning raised the success probability of object transportation by restricting the working area. Multiple robots could easily learn a new policy by exploiting the pre-trained policy of a single robot. This single- to multi-robot curriculum can help robots to learn a transporting method with trial and error. Simulation results are presented to verify the proposed techniques.

show abstract

Decentralized Multi-Agent Pursuit Using Deep Reinforcement Learning

Cited by 75 publications

References 34 publications

Pursuit-evasion differential games of players with different speeds in spaces of different dimensions

Pursuit-evasion differential games of players with different speeds in spaces of different dimensions

T3OMVP: A Transformer-based Time and Team Reinforcement Learning Scheme for Observation-constrained Multi-Vehicle Pursuit in Urban Area

Cooperative Object Transportation Using Curriculum-Based Deep Reinforcement Learning

Contact Info

Product

Resources

About