2021
DOI: 10.1109/access.2021.3113350
|View full text |Cite
|
Sign up to set email alerts
|

QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning

Abstract: When individuals interact with one another to accomplish specific goals, they learn from others' experiences to achieve the tasks at hand. The same holds for learning in virtual environments, such as video games. Deep multiagent reinforcement learning shows promising results in terms of completing many challenging tasks. To demonstrate its viability, most algorithms use value decomposition for multiple agents. To guide each agent, behavior value decomposition is utilized to decompose the combined Q-value of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 13 publications
0
1
0
Order By: Relevance
“…In cooperative MARL, a state-of-the-art approach widely adopts to learn the policy of each agent by decomposing the joint value function [3]- [8] and are known to outperform another class, an actor-critic in general [9], [10]. For exploration, even the former mostly rely on a simple mechanism called -greedy, where exploration solely resorts to a small probability of randomness.…”
Section: Introductionmentioning
confidence: 99%
“…In cooperative MARL, a state-of-the-art approach widely adopts to learn the policy of each agent by decomposing the joint value function [3]- [8] and are known to outperform another class, an actor-critic in general [9], [10]. For exploration, even the former mostly rely on a simple mechanism called -greedy, where exploration solely resorts to a small probability of randomness.…”
Section: Introductionmentioning
confidence: 99%