2016
DOI: 10.1007/s10846-015-0317-9
|View full text |Cite
|
Sign up to set email alerts
|

A Learning Invader for the “Guarding a Territory” Game

Abstract: This paper explores the use of a learning algorithm in the "guarding a territory" game. The game occurs in continuous time, where a single learning invader tries to get as close as possible to a territory before being captured by a guard. Previous research has approached the problem by letting only the guard learn. We will examine the other possibility of the game, in which only the invader is going to learn. Furthermore, in our case the guard is superior (faster) to the invader. We will also consider using mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 20 publications
(24 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…where k 1 = α 2 x 0 Pi and k 2 = (1 − α 2 )l + α 2 x 0 Pi . For clarity, it can be seen thatB 1 2 can be expressed by a base curve in (11), that is,B 1 2 = F 1 2 (x 0 Pi ). Then, we focus on the case x * p = 0, namely, p * = m. Denote this part ofB 1 byB 1 1 which is the left orange curve in Fig.…”
Section: B Two Pursuers Versus One Evadermentioning
confidence: 99%
See 1 more Smart Citation
“…where k 1 = α 2 x 0 Pi and k 2 = (1 − α 2 )l + α 2 x 0 Pi . For clarity, it can be seen thatB 1 2 can be expressed by a base curve in (11), that is,B 1 2 = F 1 2 (x 0 Pi ). Then, we focus on the case x * p = 0, namely, p * = m. Denote this part ofB 1 byB 1 1 which is the left orange curve in Fig.…”
Section: B Two Pursuers Versus One Evadermentioning
confidence: 99%
“…For example, in collision avoidance and path planning, how a group of vehicles can get into some target set or escape from a bounded region through an exit, while avoiding dangerous situations, such as collisions with static or moving obstacles [6]- [8]. In region pursuit games, multiple pursuers are used to intercept multiple adversarial intruders [9]- [11]. In safety verification, an agent often needs to judge whether it can guarantee its arrival into a safe region throughout plenty of dynamic dangers, such as disturbances and adversaries [12].…”
Section: Introductionmentioning
confidence: 99%
“…Reinforcement learning (RL) is a category of machine learning algorithms that has garnered a lot of attention over the past decade [9]. In RL, a controllable entity or agent interacts with its environment and receives information in return in the form of states and rewards [10]. Through training, the agent will map actions to states and will try to maximize long term rewards.…”
Section: Introductionmentioning
confidence: 99%
“…An LMPC is used to control the UAV team during formation flight, while a combination of decentralized LMPC and FL is used to solve the problem of dynamic encirclement. The switching decision is controlled by a fuzzy logic controller derived using a fuzzy Q-learning approach [47,48] according to the surrounding factors. We occupy ourselves with a decentralized high-level controller, where each team member generates the required path necessary to respect the line-of-breast formation and encirclement conditions.…”
Section: Introductionmentioning
confidence: 99%