Multi-agent reinforcement learning as a rehearsal for decentralized planning

Kraemer, Landon; Banerjee, Bikramjit

doi:10.1016/j.neucom.2016.01.031

Cited by 255 publications

(126 citation statements)

References 9 publications

Supporting

Mentioning

119

Contrasting

Order By: Relevance

“…Reinforcement learning as a rehearsal (RLaR) (Kraemer & Banerjee, 2016) is a related approach, where rather than a complete demonstration, an external entity (which could be a human) informs the learning agents about the parts of their state spaces that are hidden from their view, enabling them to perform RL as a rehearsal, but the agents must learn policies that do not rely on the hidden parts of their states. In da Silva et al (2017), similar online feedback is exchanged among the agents themselves, but in this work we seek to limit any advice to be an off-line prior, thus limiting the need for communication during learning.…”

Section: Related Workmentioning

confidence: 99%

Team learning from human demonstration with coordination confidence

Banerjee

Vittanala

Taylor

2019

The Knowledge Engineering Review

Self Cite

View full text Add to dashboard Cite

Among an array of techniques proposed to speed-up reinforcement learning (RL), learning from human demonstration has a proven record of success. A related technique, called Human-Agent Transfer, and its confidence-based derivatives have been successfully applied to single-agent RL. This article investigates their application to collaborative multi-agent RL problems. We show that a first-cut extension may leave room for improvement in some domains, and propose a new algorithm called coordination confidence (CC). CC analyzes the difference in perspectives between a human demonstrator (global view) and the learning agents (local view) and informs the agents’ action choices when the difference is critical and simply following the human demonstration can lead to miscoordination. We conduct experiments in three domains to investigate the performance of CC in comparison with relevant baselines.

show abstract

Section: Related Workmentioning

confidence: 99%

Team learning from human demonstration with coordination confidence

Banerjee

Vittanala

Taylor

2019

The Knowledge Engineering Review

Self Cite

View full text Add to dashboard Cite

show abstract

“…The model involving the centralized training of decentralized policies has created considerable demand in the efficient training of multiple agents [19] [29]. This model can address the challenge of non-Markovian and nonstationary environments during learning [15] and can access the additional state information of other agents while promoting communication [31].…”

Section: Introductionmentioning

confidence: 99%

Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning

2020

View full text Add to dashboard Cite

In multi-agent reinforcement learning, it is essential for agents to learn communication protocol to optimize collaboration policies and to solve unstable learning problems. Existing methods based on actorcritic networks solve the communication problem among agents. However, these methods have difficulty in improving sample efficiency and learning robust policies because it is not easy to understand the dynamics and nonstationary of the environment as the policies of other agents change. We propose a method for learning cooperative policies in multi-agent environments by considering the communications among agents. The proposed method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients to centrally train decentralized policies. The actor networks cause the agents to communicate using forward and backward paths and to determine subsequent actions. The critic network helps to train the actor networks by sending gradient signals to the actors according to their contribution to the global reward. To address issues with partial observability and unstable learning, we propose using auxiliary prediction networks to approximate state transitions and the reward function. We used multi-agent environments to demonstrate the usefulness and superiority of the proposed method by comparing it with existing multi-agent reinforcement learning methods, in terms of both learning efficiency and goal achievements in the test phase. The results demonstrate that the proposed method outperformed other alternatives. INDEX TERMS reinforcement learning, model-free method, multi-agent system, multi-agent cooperation, actor-critic method, deterministic policy gradient SECTION I.

show abstract

“…As a result, multiple agents perform RL individually. However, a distributed architecture suffers from the moving target problem [24], where the behavior of each agent can impact on behaviors of other agents. On the contrary, the centralized architecture used in this paper assumes one agent only controlling all cells in the mobile network.…”

Section: Introduction a Backgroundmentioning

confidence: 99%

Deep Convolutional Neural Network Assisted Reinforcement Learning Based Mobile Network Power Saving

Wang

Bai

2020

IEEE Access

View full text Add to dashboard Cite

This paper addresses the power saving problem in mobile networks. Base station (BS) power and network traffic volume (NTV) models are first established. The BS power is modeled based on in-house equipment measurement by sampling different BS load configurations. The NTV model is built based on traffic data in the literature. Then, a threshold-based adaptive power saving method is discussed, serving as the benchmark. Next, a BS power control framework is created using Q-learning. The action-state function of the Q-learning is approximated via a deep convolutional neural network (DCNN). The DCNN-Q agent is designed to control the loads of cells in order to adapt to NTV variations and reduce power consumption. The DCNN-Q power saving framework is trained and simulated in a heterogeneous network including macrocells and microcells. It can be concluded that with the proposed DCNN-Q method, the power saving outperforms the threshold-based method. INDEX TERMS Power saving, deep convolutional neural network, reinforcement learning.

show abstract

Multi-agent reinforcement learning as a rehearsal for decentralized planning

Cited by 255 publications

References 9 publications

Team learning from human demonstration with coordination confidence

Team learning from human demonstration with coordination confidence

Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning

Deep Convolutional Neural Network Assisted Reinforcement Learning Based Mobile Network Power Saving

Contact Info

Product

Resources

About