Deep Reinforcement Based Power Allocation for the Max-Min Optimization in Non-Orthogonal Multiple Access

Siddiqi, Umair F.; Sait, Sadiq M.; Uysal, Murat

doi:10.1109/access.2020.3038923

Cited by 12 publications

(10 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MDPs consist of states, actions, and a reward function definition. In [21], the state of an MDP model includes the power-coefficients values, data-rate of users, and vectors indicating which of the powercoefficients can be increased or decreased. In [14], the channel conditions and transmission power are considered as state and action, respectively, and both of them have continuous space.…”

Section: A Motivation and Contributionmentioning

confidence: 99%

Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO

Rahmani

Bashar

Dehghani

et al. 2023

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

The uplink of a cell-free massive multiple-input multiple-output with maximum-ratio combining (MRC) and zero-forcing (ZF) schemes are investigated. A power allocation optimization problem is considered, where two conflicting metrics, namely the sum rate and fairness, are jointly optimized. As there is no closed-form expression for the achievable rate in terms of the large scale-fading (LSF) components, the sum rate fairness trade-off optimization problem cannot be solved by using known convex optimization methods. To alleviate this problem, we propose two new approaches. For the first approach, a use-and-then-forget scheme is utilized to derive a closed-form expression for the achievable rate. Then, the fairness optimization problem is iteratively solved through the proposed sequential convex approximation (SCA) scheme. For the second approach, we exploit LSF coefficients as inputs of a twin delayed deep deterministic policy gradient (TD3), which efficiently solves the non-convex sum rate fairness trade-off optimization problem. Next, the complexity and convergence properties of the proposed schemes are analyzed. Numerical results demonstrate the superiority of the proposed approaches over conventional power control algorithms in terms of the sum rate and minimum user rate for both the ZF and MRC receivers. Moreover, the proposed TD3-based power control achieves better performance than the proposed SCA-based approach as well as the fractional power scheme. Index Terms-Cell-free massive MIMO, deep reinforcement learning, fairness, power control, sequential convex approximation. I. IntroductionThe vast development of mobile communication networks and the number of supported devices have imposed increasing demands for much higher data-rate mobile communication. Cell-free massive multiple-input multipleoutput (mMIMO) is a key enabling wireless network technology as it greatly increases coverage probability and the data rate [2]. In cell-free mMIMO, a large number of

show abstract

Section: A Motivation and Contributionmentioning

confidence: 99%

Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO

Rahmani

Bashar

Dehghani

et al. 2023

IEEE Trans. Veh. Technol.

View full text Add to dashboard Cite

show abstract

“…The design of the reward ρ(s n , a n ) is related to the goal of the original design problem (P1) which attempts to maximize the worst sum rate of WNs. Here, we consider three kinds of reward designs for ρ(s n , a n ), namely worst accumulated sum rate among users (WASR), difference of worst accumulated sum rate among users at two adjacent time slots (DWASR) [29], and instantaneous average sum rate of users (ISR), which are in turn defined as follows:…”

Section: Reward Designmentioning

confidence: 99%

“…represents the accumulated sum data rate for the kth WN up to the (n−1)th time, and R k,n is calculated by (16) with respect to the action a n and the state s n at the time instant n. The idea of the WASR method is to simply use the worst accumulated sum rate of the WNs as the reward according to the objective function of the optimization problem (P1). The DWASR method is conceptualized in accordance with [29], for which the difference of the worst accumulated sum rate at two adjacent time slots n and n − 1 is computed as the reward. The ISR method is proposed in this paper to take the instantaneous average sum rate of all WNs as the reward.…”

Section: Reward Designmentioning

confidence: 99%

UAV Trajectory, User Association and Power Control for Multi-UAV Enabled Energy Harvesting Communications: Offline Design and Online Reinforcement Learning

Fu¹,

Ku²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we consider multiple solar-powered wireless nodes which utilize the harvested solar energy to transmit collected data to multiple unmanned aerial vehicles (UAVs) in the uplink. In this context, we jointly design UAV flight trajectories, UAV-node communication associations, and uplink power control to effectively utilize the harvested energy and manage co-channel interference within a finite time horizon. To ensure the fairness of wireless nodes, the design goal is to maximize the worst user rate. The joint design problem is highly non-convex and requires causal (future) knowledge of the instantaneous energy state information (ESI) and channel state information (CSI), which are difficult to predict in reality. To overcome these challenges, we propose an offline method based on convex optimization that only utilizes the average ESI and CSI. The problem is solved by three convex subproblems with successive convex approximation (SCA) and alternative optimization. We further design an online convex-assisted reinforcement learning (CARL) method to improve the system performance based on real-time environmental information. An idea of multi-UAV regulated flight corridors, based on the optimal offline UAV trajectories, is proposed to avoid unnecessary flight exploration by UAVs and enables us to improve the learning efficiency and system performance, as compared with the conventional reinforcement learning (RL) method. Computer simulations are used to verify the effectiveness of the proposed methods. The proposed CARL method provides 25% and 12% improvement on the worst user rate over the offline and conventional RL methods.

show abstract

“…The number of neurons in the input layers for the DNNs are 2(1 + L) + K as we explained in the description of state s t n,k , see (17). We set L = 20 therefore there are 52 neurons in the input layer.…”

Section: B Hyper-parameter Selectionmentioning

confidence: 99%

“…In reinforcement learning, the idea is to determine how agents ought to take actions by observing an environment such that a cumulative reward is maximized [13]. Several deep reinforcement learning (DRL)-based methods have been applied to solve power optimization problems in cellular networks, e.g., [14]- [17]. However, none of these works considered massive MIMO systems.…”

Section: Introductionmentioning

confidence: 99%

Dynamic Power Allocation for Cell-Free Massive MIMO: Deep Reinforcement Learning Methods

2021

View full text Add to dashboard Cite

Power allocation plays a central role in cell-free (CF) massive multiple-input multiple-output (MIMO) systems. Many effective methods, e.g., the weighted minimum mean square error (WMMSE) algorithm, have been developed for optimizing the power allocation. Since the state of the channels evolves in time, the power allocation should stay in tune with this state. The present methods to achieve this typically find a near-optimal solution in an iterative manner at the cost of a considerable computational complexity, potentially compromising the timeliness of the power allocation. In this paper we address this problem by exploring the use of data-driven methods since they can achieve near-optimal performance with a low computational complexity. Deep reinforcement learning (DRL) is one such method. We explore two DRL power allocation methods, namely the deep Q-network (DQN) and the deep deterministic policy gradient (DDPG). The objective is to maximize the sum-spectral efficiency (SE) in CF massive MIMO, operating in the microwave domain. The numerical results, obtained for a 3GPP indoor scenario, show that the DRL-based methods have a competitive performance compared with WMMSE and the computational complexity is much lower. We found that the well-trained DRL methods achieved at least a 33% higher sum-SE than the WMMSE algorithm. The execution times of the DRL methods in our simulation platform are 0.1% of the WMMSE algorithm.INDEX TERMS Cell-free massive MIMO, deep reinforcement learning, deep Q-network, deep deterministic policy gradient, power allocation.

show abstract

Deep Reinforcement Based Power Allocation for the Max-Min Optimization in Non-Orthogonal Multiple Access

Cited by 12 publications

References 34 publications

Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO

Deep Reinforcement Learning-Based Sum Rate Fairness Trade-Off for Cell-Free mMIMO

UAV Trajectory, User Association and Power Control for Multi-UAV Enabled Energy Harvesting Communications: Offline Design and Online Reinforcement Learning

Dynamic Power Allocation for Cell-Free Massive MIMO: Deep Reinforcement Learning Methods

Contact Info

Product

Resources

About