Deep Contextual Bandits for Orchestrating Multi-User MISO Systems with Multiple RISs

Fresnedo²,

IEEE Trans. Veh. Technol.

et al. 2023

The combination of multiple-input multiple-output (MIMO) and intelligent reflecting surfaces (IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system with the aim of maximizing the system sum-rate for every channel realization. The first one is a novel contextual bandit (CB) approach with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is properly met. Both proposals perform remarkably better than state-of-the-art heuristic methods in high multi-user interference scenarios.

Section: B Related Workmentioning

confidence: 99%

Section: B Related Workmentioning

confidence: 99%

Section: B Related Workmentioning

confidence: 99%

Section: B Channel Modelmentioning

confidence: 99%

Section: B Dcb-ddpg: Framework Elementsmentioning

confidence: 99%

See 3 more Smart Citations

Deep Contextual Bandit and Reinforcement Learning for IRS-Assisted MU-MIMO Systems

Fresnedo²,

IEEE Trans. Veh. Technol.

et al. 2023

Deep Contextual Bandit and Reinforcement Learning for IRS-assisted MU-MIMO Systems

Fresnedo²,

et al. 2022

Preprint

<p> The combination of multiple-input multiple-output (MIMO) and intelligent reflecting surfaces (IRSs) is foreseen as a key enabler of beyond 5G (B5G) and 6G. In this work, two different approaches are considered for the joint optimization of the IRS phase-shift matrix and MIMO precoders of an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system with the aim of maximizing the system sum-rate for every channel realization. The first one is a novel contextual bandit (CB) approach with continuous state and action spaces called deep contextual bandit-oriented deep deterministic policy gradient (DCB-DDPG). The second is an innovative deep reinforcement learning (DRL) formulation where the states, actions and rewards are selected such that the Markov decision process (MDP) property of reinforcement learning (RL) is properly met. Both proposals perform remarkably better than state-of-the-art heuristic methods in high multi-user interference scenarios. </p>

Deep Contextual Bandit and Reinforcement Learning for IRS-assisted MU-MIMO Systems

Fresnedo²,

et al. 2022

Preprint