Generalized multiagent learning with performance bound

Banerjee, Bikramjit; Peng, Jing

doi:10.1007/s10458-007-9013-x

Cited by 14 publications

(20 citation statements)

References 16 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several MARL algorithms have been proposed and studied [5,11,18,22], all of which have some theoretical results of convergence in general-sum games. A common assumption of these algorithms is that an agent (or player) knows its own payoff matrix.…”

Section: Policy Learning Using Pga-appmentioning

confidence: 99%

“…A common assumption of these algorithms is that an agent (or player) knows its own payoff matrix. To guarantee convergence, each algorithm has it own additional assumptions, such as requiring an agent to know a Nash Equilibrium and the strategy of the other players [5,11,18], or observe what actions other agents executed and what rewards they received [18,22]. For practical applications, these assumptions are very constraining and unlikely to hold, and, instead, an agent can only observe the immediate reward after selecting and performing an action.…”

Section: Policy Learning Using Pga-appmentioning

confidence: 99%

See 1 more Smart Citation

Multiagent meta-level control for radar coordination

Cheng

Raja

Lesser

2013

Web Intelligence and Agent Systems: An International Journal

View full text Add to dashboard Cite

It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems. In this paper, we argue that multiagent meta-level control is an effective way to determine when this adaptation process should be done and how much effort should be invested in adaptation as opposed to continuing with the current action plan. We use a reinforcement learning based local optimization algorithm within each agent to learn multiagent meta-level control agent policies in a decentralized fashion. These policies will allow each agent to adapt to changes in environmental conditions while reorganizing the underlying multiagent network when needed. We then augment the agent with a heuristic rule-based algorithm that uses information provided by the reinforcement learning algorithm in order to resolve conflicts among agent policies from a local perspective at both learning and execution stages. We evaluate this mechanism in the context of a multiagent tornado tracking application called NetRads. Empirical results show that adaptive multiagent meta-level control significantly improves the performance of the tornado tracking network for a variety of weather scenarios. ject level, see Fig. 1), which involves the agent making decisions about what domain-level problem solving to perform in the current context and how to coordinate with other agents to complete tasks requiring joint effort. Each agent also has a higher control level that is meta-level control (see Fig. 1), which involves the agent making decisions about deliberation control itself including whether to deliberate, how many resources to dedicate to this deliberation, and what specific deliberative control to perform in the current context. In the context of MAS, the meta-level component of each agent should have a multiagent policy that coordinates its deliberation with other agents to account for what could happen as deliberation (and execution) plays out. Figure 2 describes the interaction among the meta-level control components of multiple agents.Meta-level control in complex agent-based settings was explored in previous work [2,3,33,34,41] where a sophisticated architecture that could reason about

show abstract

Section: Policy Learning Using Pga-appmentioning

confidence: 99%

Section: Policy Learning Using Pga-appmentioning

confidence: 99%

Multiagent meta-level control for radar coordination

Cheng

Raja

Lesser

2013

Web Intelligence and Agent Systems: An International Journal

View full text Add to dashboard Cite

show abstract

“…In a recent work, Chakraborty and Sen [21] proposed modeling the learning environments induced by gradient ascent learners (WoLF-IGA, WoLF-PHC [7] and ReDVaLeR [6]) as MDPs. In the presence of a gradient ascent adversary, the learning algorithm, called MB-AIM-FSI, first creates a set of hypotheses about the model of the learning environment that can be induced by the learning adversary.…”

Section: Related Workmentioning

confidence: 99%

“…Certain approaches also require the observability of the actions made by the other agents [2,3]. Finally, in the less realistic settings, the strategies (i.e., the probability distributions over actions) or the rewards obtained by the other agents are also assumed to be observable [4][5][6].…”

Section: Introductionmentioning

confidence: 99%

Effective learning in the presence of adaptive counterparts

Burkov

Chaib-draa

2009

Journal of Algorithms

View full text Add to dashboard Cite

“…The Multi-Agent Reinforcement Learning (MARL) in [5] provides a common approach for solving multi-objects decision-making problems that allows objects to dynamically adapt to changes in the IoT environment. Several MARL algorithms have been proposed in [4,[6][7][8], all of which have some theoretical results of convergence in general-sum games. A common assumption of these algorithms is that an player knows its own payoff matrix.…”

Section: Introductionmentioning

confidence: 99%

Multi-objects scalable coordinated learning in internet of things

Wang

Duan²,

Shi³

2015

Pers Ubiquit Comput

View full text Add to dashboard Cite

The coordinated learning is importance of technique for cooperative multi-objects system in large-scale Internet of Things. The coordinated learning has attracted a lot of attention for its applications in Internet of Things. However, the self-adaptive makes the coordinated learning difficult to be used in IoT. This paper proposes multi-objects scalable coordinated learning algorithm based on the maximum potential loss of coordination. The algorithm defines an interaction measure that allows objects to dynamically estimate the potential utility loss of coordination with any cluster of objects. The interaction mechanism makes each object compute their beneficial coordination set in different situations and makes the best use of their limited communication resource in Internet of Things. As a result of experiments, our algorithm adapts policy learning of object and their coordination network for different context. Finally, the experiments with the smart agriculture data set demonstrate that the proposed scheme is effective and robust.

show abstract

Generalized multiagent learning with performance bound

Cited by 14 publications

References 16 publications

Multiagent meta-level control for radar coordination

Multiagent meta-level control for radar coordination

Effective learning in the presence of adaptive counterparts

Multi-objects scalable coordinated learning in internet of things

Contact Info

Product

Resources

About