2014
DOI: 10.1609/aaai.v28i1.8886
|View full text |Cite
|
Sign up to set email alerts
|

Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs

Abstract: Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contribution… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 23 publications
(13 citation statements)
references
References 33 publications
0
13
0
Order By: Relevance
“…The Markovian DDCOPs algorithm [8] is a reactive online learning based on the joint action of agents. Unlike other methods, the proactive algorithms [9,27] consider the dependence between two consecutive time steps.…”
Section: Literature Reviewmentioning
confidence: 99%
See 3 more Smart Citations
“…The Markovian DDCOPs algorithm [8] is a reactive online learning based on the joint action of agents. Unlike other methods, the proactive algorithms [9,27] consider the dependence between two consecutive time steps.…”
Section: Literature Reviewmentioning
confidence: 99%
“…šœŽ: š‘‹ ā†’ š“ is an onto total function, from variables to agents, which assigns the control of each variable š‘„ āˆˆ š‘‹ to an agent šœŽ(š‘„). A complete solution is a value assignment for all variables and the objective is to find a reward-maximal complete solution [8].…”
Section: Distributed Constraint Optimization Problemmentioning
confidence: 99%
See 2 more Smart Citations
“…Average reward MDPs are natural models of many non-terminating tasks, such as the call admission control and routing problem (Marbach, Mihatsch, and Tsitsiklis 2000) and the automatic guided vehicle routing problem (Ghavamzadeh and Mahadevan 2007). Average reward RL has received much attention in the recent years (Ortner 2013;Mahadevan 2014;Nguyen et al 2014).…”
Section: Introductionmentioning
confidence: 99%