2019
DOI: 10.1109/tsmc.2018.2865488
|View full text |Cite
|
Sign up to set email alerts
|

An Efficient Computing of Correlated Equilibrium for Cooperative $Q$-Learning-Based Multi-Robot Planning

Abstract: Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 76 publications
(48 reference statements)
0
3
0
Order By: Relevance
“…Therefore, compared with traditional jamming decision-making methods, Q-Learning algorithm-based jamming decision-making methods can realize learning while fighting, which is expected to be a major research direction and future trend. Q-Learning is currently widely used in robot path planning [33,34], nonlinear control [35,36] and resource allocation scheduling [37,38] and has yielded specific results in recent years for radar jamming decision-making. Xing et al [39,40] proposed applying the Q-Learning algorithm to radar countermeasures for the problem of unknown radar operating modes.…”
Section: Cognitive Radar Jamming Decision-making Methodsmentioning
confidence: 99%
“…Therefore, compared with traditional jamming decision-making methods, Q-Learning algorithm-based jamming decision-making methods can realize learning while fighting, which is expected to be a major research direction and future trend. Q-Learning is currently widely used in robot path planning [33,34], nonlinear control [35,36] and resource allocation scheduling [37,38] and has yielded specific results in recent years for radar jamming decision-making. Xing et al [39,40] proposed applying the Q-Learning algorithm to radar countermeasures for the problem of unknown radar operating modes.…”
Section: Cognitive Radar Jamming Decision-making Methodsmentioning
confidence: 99%
“…Where Q(s, a) estimates the action value after applying an action a in state s, α is the learning rate, γ is the discount factor, and r is the immediate reward received [26]. The Qlearning main components are as follows.…”
Section: B Q-learningmentioning
confidence: 99%
“…These approaches [18,22] obtain the optimal decision of two competitor agents, in which the opponent action of the current agent could be considered to be a special kind of equivalent action. A computation rule to calculate joint action for Q-value function is proposed to reduce the compute complexity [23], and the Q-value function of the current agent is designed using the neighborhood equivalent action [24]. They do not consider the incidence relation of all the agents, hence the memory space for training the Q-value is smaller and the training process is faster.…”
Section: Introductionmentioning
confidence: 99%