As a technology that can accommodate more users and significantly improve spectral efficiency, non-orthogonal multiple access (NOMA) has attracted the attention of many scholars in recent years. The basic idea of NOMA is to implement multiple access in the power domain and decode the desired signal via successive interference cancellation (SIC). However, the resource allocation problem in such NOMA system is non-convex. It is difficult to directly solve this optimization problem through conventional methods. As such, we propose to apply a reinforcement learning (RL) approach based on cooperative Qlearning to solve the resource allocation problem in multi-antenna downlink NOMA systems. First, we formulate the resource allocation process as a sum rate maximization problem, subject to the power budget constraints and quality of service (QoS) condition. Second, we design a reward function to improve the sum rate while meeting the power and capacity constraints. Multiple Q-tables are created and cooperatively updated to get the optimal beamforming matrix. Then, we analyze the convergence of our proposed RL based power allocation method. Our simulations show that the proposed power allocation scheme yields excellent performance in terms of sum rate, energy efficiency, and spectral efficiency. INDEX TERMS NOMA, power allocation, reinforcement learning, sum rate, spectral efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.