This article reviews recent advances in multi-agent reinforcement learning algorithms for largescale control systems and communication networks, which learn to communicate and cooperate. We provide an overview of this emerging field, with an emphasis on the decentralized setting under different coordination protocols. We highlight the evolution of reinforcement learning algorithms from single-agent to multi-agent systems, from a distributed optimization perspective, and conclude with future directions and challenges, in the hope to catalyze the growing synergy among distributed optimization, signal processing, and reinforcement learning communities.
I. INTRODUCTIONFueled with recent advances in deep neural networks, reinforcement learning (RL) has been in the limelight for many recent breakthroughs in artificial intelligence, including defeating humans in games (e.g., chess, Go, StarCraft), self-driving cars, smart home automation, service robots, among many others.Despite these remarkable achievements, many basic tasks can still elude a single RL agent. Examples abound from multi-player games, multi-robots, cellular antenna tilt control, traffic control systems, smart power grids to network management.Often, cooperation among multiple RL agents is much more critical: multiple agents must collaborate to complete a common goal, expedite learning, protect privacy, offer resiliency against failures and adversarial attacks, and overcome the physical limitations of a single RL agent behaving alone. These tasks are studied under the umbrella of cooperative multi-agent RL (MARL), where agents seek to learn optimal policies to maximize a shared team reward, while interacting with an unknown stochastic environment and with each other. Cooperative MARL is far more challenging than the single-agent case due to: i) the exponentially growing search space, ii) the non-stationary and unpredictable environment caused by