Homo Egualis Reinforcement Learning Agents for Load Balancing

Verbeeck, Katja; Parent, Johan; Nowé, Ann

doi:10.1007/978-3-540-45173-0_6

Cited by 5 publications

(7 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a common interest game, ESRL is able to find one of the Pareto optimal solutions of the game. In a conflicting interest game, we show that ESRL agents learn optimal fair, possibly periodical policies [17,26]. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection.…”

Section: Introductionmentioning

confidence: 98%

“…Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. In [26] a job scheduling experiment is solved by conflicting interest ESRL agents. In this paper, we describe the problem of adaptive load-balancing parallel applications, handled by ESRL agents as a common interest game.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck

Nowé

Parent

et al. 2006

Auton Agent Multi-Agent Syst

Self Cite

View full text Add to dashboard Cite

In this paper we introduce a new multi-agent reinforcement learning algorithm, called exploring selfish reinforcement learning (ESRL). ESRL allows agents to reach optimal solutions in repeated non-zero sum games with stochastic rewards, by using coordinated exploration. First, two ESRL algorithms for respectively common interest and conflicting interest games are presented. Both ESRL algorithms are based on the same idea, i.e. an agent explores by temporarily excluding some of the local actions from its private action space, to give the team of agents the opportunity to look for better solutions in a reduced joint action space. In a latter stage these two algorithms are transformed into one generic algorithm which does not assume that the type of the game is known in advance. ESRL is able to find the Pareto optimal solution in common interest games without communication. In conflicting interest games ESRL only needs limited communication to learn a fair periodical policy, resulting in a good overall policy. Important to know is that ESRL agents are independent in the sense that they only use their own action choices and rewards to base their decisions on, that ESRL agents are flexible in learning different solution concepts and they can handle both stochastic, possible delayed rewards and asynchronous action selection. A real-life experiment, i.e. adaptive load-balancing of parallel applications is added.K. Verbeeck (B) Computational Modeling Lab (COMO), Vrije Universiteit Brussel, Brussels, Belgium

show abstract

Section: Introductionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

Exploring selfish reinforcement learning in repeated games with stochastic rewards

Verbeeck

Nowé

Parent

et al. 2006

Auton Agent Multi-Agent Syst

Self Cite

View full text Add to dashboard Cite

show abstract

“…In previous literature [17,18], distributed algorithms for discovering such sequences found suboptimal ones. We will show optimal solutions, albeit with non-distributed algorithms.…”

Section: Infinite-length Gamesmentioning

confidence: 99%

“…The starting point in our research on long-term fairness was the work in [18] on "periodic policies." Their reward model comes in the form of a normal-form game, but the players are actually cooperative learning agents (rather than self-interested).…”

Section: Related Workmentioning

confidence: 99%

“…The actions are chosen with replacement, and actions chosen early do not restrict what actions can be chosen later, or their rewards. This framework, borrowed from [18], is very similar to the repeated normal-form game framework from game theory, 1 except there is a single decision maker that chooses actions for the good of all beneficiaries.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Long-term fairness with bounded worst-case losses

Balan

Richards

Luke

2009

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

How does one repeatedly choose actions so as to be fairest to the multiple beneficiaries of those actions? We examine approaches to discovering sequences of actions for which the worst-off beneficiaries are treated maximally well, then secondarily the second-worst-off, and so on. We formulate the problem for the situation where the sequence of action choices continues forever; this problem may be reduced to a set of linear programs. We then extend the problem to situations where the game ends at some unknown finite time in the future. We demonstrate that an optimal solution is NP-hard, and present two good approximation algorithms.

show abstract