A Learning Invader for the “Guarding a Territory” Game

Raslan, Hashem; Schwartz, Howard M.; Givigi, Sidney N.

doi:10.1007/s10846-015-0317-9

Cited by 20 publications

(24 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…where k 1 = α 2 x 0 Pi and k 2 = (1 − α 2 )l + α 2 x 0 Pi . For clarity, it can be seen thatB 1 2 can be expressed by a base curve in (11), that is,B 1 2 = F 1 2 (x 0 Pi ). Then, we focus on the case x * p = 0, namely, p * = m. Denote this part ofB 1 byB 1 1 which is the left orange curve in Fig.…”

Section: B Two Pursuers Versus One Evadermentioning

confidence: 99%

“…For example, in collision avoidance and path planning, how a group of vehicles can get into some target set or escape from a bounded region through an exit, while avoiding dangerous situations, such as collisions with static or moving obstacles [6]- [8]. In region pursuit games, multiple pursuers are used to intercept multiple adversarial intruders [9]- [11]. In safety verification, an agent often needs to judge whether it can guarantee its arrival into a safe region throughout plenty of dynamic dangers, such as disturbances and adversaries [12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Task Assignment for Multiplayer Reach–Avoid Games in Convex Domains via Analytical Barriers

2020

View full text Add to dashboard Cite

This work considers a multiplayer reach-avoid game between two adversarial teams in a general convex domain which consists of a target region and a play region. The evasion team, initially lying in the play region, aims to send as many its team members into the target region as possible, while the pursuit team with its team members initially distributed in both play region and target region, strives to prevent that by capturing the evaders. We aim at investigating a task assignment about the pursuer-evader matching, which can maximize the number of the evaders who can be captured before reaching the target region safely when both teams play optimally. To address this, two winning regions for a group of pursuers to intercept an evader are determined by constructing an analytical barrier which divides these two parts. Then, a task assignment to guarantee the most evaders intercepted is provided by solving a simplified 0-1 integer programming instead of a non-deterministic polynomial problem, easing the computation burden dramatically. It is worth noting that except the task assignment, the whole analysis is analytical. Finally, simulation results are also presented.

show abstract

Section: B Two Pursuers Versus One Evadermentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Task Assignment for Multiplayer Reach–Avoid Games in Convex Domains via Analytical Barriers

2020

View full text Add to dashboard Cite

show abstract

“…Reinforcement learning (RL) is a category of machine learning algorithms that has garnered a lot of attention over the past decade [9]. In RL, a controllable entity or agent interacts with its environment and receives information in return in the form of states and rewards [10]. Through training, the agent will map actions to states and will try to maximize long term rewards.…”

Section: Introductionmentioning

confidence: 99%

A MARL Approach for Optimizing Positions of VANET Aerial Base-Stations on a Sparse Highway

2021

View full text Add to dashboard Cite

A Vehicular Ad-Hoc Network (VANET) helps vehicles send and receive environmental and traffic information, making it a crucial component towards fully autonomous roads. For VANETs to serve their purpose, there has to be sufficient coverage, even in less populated areas. Moreover, a lot of the safety information is time-sensitive; excessive delay in data transfer can increase the risk of fatal accidents. Unmanned Aerial Vehicles (UAVs) can be used as mobile base-stations to fill in gaps of coverage. The placement of these UAVs is crucial towards how well the system performs. We are particularly interested in the placement of mobile base-stations for a rural highway with sparse traffic, as it represents the worst-case scenario for vehicular communication. Instead of heuristic or linear programming methods for optimal placement, we use multi-agent reinforcement learning (MARL). The main benefit of MARL is that it allows the agents to learn model-free through experience. We propose a variation of the traditional Deep Independent Q-Learning. The modifications include an observation function augmented with information directly shared between neighbouring agents as well a shared policy scheme. We also implement a custom sparse highway simulator that is used for training and testing our algorithm. Our testing shows that the proposed MARL algorithm is able to learn the placement policies that produce the maximum rewards for different scenarios while adapting to the dynamic road densities along the service area. Our experiments show that our model is scalable, allowing the number of agents to increase without any modifications to the code. Finally, we show that our model can be generalized as the algorithm can be directly used and performs equally as well on an industry standard simulator. Future experiments can be performed to improve the realism and complexity of the highway models and adapting the algorithm to real-world scenarios.

show abstract

“…An LMPC is used to control the UAV team during formation flight, while a combination of decentralized LMPC and FL is used to solve the problem of dynamic encirclement. The switching decision is controlled by a fuzzy logic controller derived using a fuzzy Q-learning approach [47,48] according to the surrounding factors. We occupy ourselves with a decentralized high-level controller, where each team member generates the required path necessary to respect the line-of-breast formation and encirclement conditions.…”

Section: Introductionmentioning

confidence: 99%

Multi-UAV Tactic Switching via Model Predictive Control and Fuzzy Q-Learning

Hafez

Givigi

Yousefi³

et al. 2017

Eng. Sci. and Milit. Techno.

Self Cite

View full text Add to dashboard Cite

Teams of cooperative Unmanned Aerial Vehicles (UAVs) require intelligent and flexible control strategies to allow the accomplishment of a multitude of challenging group tasks. In this paper, we introduce a solution for the problem of tactic switching between the formation flight tactic and the dynamic encirclement tactic for a team of cooperative UAVs using a predictive decentralize control approach. Decentralized Model Predictive Control (MPC) is used to generate tactics for a team of UAVs in simulation and real-world validation. A high-level Linear Model Predictive Control (LMPC) policy is used to control the UAV team during the execution of a desired formation, while a combination of decentralized LMPC and Feedback Linearization (FL) is applied to the UAV team to accomplish dynamic encirclement. The decision of switching from one tactic to the other is derived by a fuzzy logic controller, which, in its turn, is derived by a Reinforcement Learning (RL) approach. The main contributions of this paper are: (i) solution of the problem of tactic switching for a team of cooperative UAVs using LMPC and a fuzzy controller derived via RL; (ii) simulations demonstrating the efficiency of the method; and (iii) implementation of the solution to on-board real-time controllers on Qball-X4 quadrotors.

show abstract

A Learning Invader for the “Guarding a Territory” Game

Cited by 20 publications

References 15 publications

Task Assignment for Multiplayer Reach–Avoid Games in Convex Domains via Analytical Barriers

Task Assignment for Multiplayer Reach–Avoid Games in Convex Domains via Analytical Barriers

A MARL Approach for Optimizing Positions of VANET Aerial Base-Stations on a Sparse Highway

Multi-UAV Tactic Switching via Model Predictive Control and Fuzzy Q-Learning

Contact Info

Product

Resources

About