A Graph Attention Learning Approach to Antenna Tilt Optimization

Jin, Y. J.; Vannella, Filippo; Bouton, Maxime; Jeong, Jaeseong; Håkim, Ezeddin Al

doi:10.1109/6gnet54646.2022.9830258

Cited by 7 publications

(1 citation statement)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of data-driven learning-based methods such as reinforcement learning (RL) has recently received growing interest within telecommunications in the last decade [5], [6], [7], [8]. In later years, the scalability of these methods has been addressed and different distributed RL approaches have been attempted [9], [10].…”

Section: Introductionmentioning

confidence: 99%

Network Parameter Control in Cellular Networks through Graph-Based Multi-Agent Constrained Reinforcement Learning

Forsberg,

Nikou,

Feljan

et al. 2023

2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

View full text Add to dashboard Cite

Cellular networks are growing in complexity at increasing speed and the geographical locations in which they are deployed in are getting denser. Traditional control methods fall short in providing a scalable and dynamic way of adapting the network to new conditions. Distributed multiagent reinforcement learning successfully addresses scalability problems seen in centralized approaches. The question of achieving learning with constraint satisfaction in distributed systems is still left unanswered in the state-of-the-art. In this work, we aim to perform distributed multi-agent constrained reinforcement learning in order to learn a policy online while satisfying imposed constraints. We use a coordination graph to model the interactions between agents and decompose the global value function. A conservative safety critic is trained in parallel to evaluate the safety of proposed actions. Our method allows for separate training of both the critic and the value network independently of each other, and hence offers flexibility in how and when to train the different models. The results are compared to a baseline using no safety critic. Simulations show that the agents succeed in learning a policy that can satisfy the constraints, while still maximizing the objective.

show abstract