Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

Hassan

IEEE Commun. Surv. Tutorials

et al. 2020

273

159

Internet-of-Things (IoT) refers to a massively heterogeneous network formed through smart devices connected to the Internet. In the wake of disruptive IoT with huge amount and variety of data, Machine Learning (ML) and Deep Learning (DL) mechanisms will play pivotal role to bring intelligence to the IoT networks. Among other aspects, ML and DL can play an essential role in addressing the challenges of resource management in large-scale IoT networks. In this article, we conduct a systematic and in-depth survey of the ML-and DL-based resource management mechanisms in cellular wireless and IoT networks. We start with the challenges of resource management in cellular IoT and low-power IoT networks, review the traditional resource management mechanisms for IoT networks, and motivate the use of ML and DL techniques for resource management in these networks. Then, we provide a comprehensive survey of the existing ML-and DL-based resource allocation techniques in wireless IoT networks and also techniques specifically designed for HetNets, MIMO and D2D communications, and NOMA networks. To this end, we also identify the future research directions in using ML and DL for resource allocation and management in IoT networks.

Section: Machine Learning and Deep Learning For Resource Managementmentioning

confidence: 99%

Machine Learning for Resource Management in Cellular and IoT Networks: Potentials, Current Solutions, and Open Challenges

Hassan

IEEE Commun. Surv. Tutorials

et al. 2020

273

159

IEEE Trans. Cogn. Commun. Netw.

“…Inspired by the achievements of reinforcement learning in dynamic control problems, such as the game of Atari [16], and AlphaGo [17], there has been increased interest in seeking reinforcement learning based solutions for problems in wireless communications. As summarized in [18] and [19], deep reinforcement learning algorithms have been applied in various wireless settings. For example, the authors in [20] and [21] investigate the use of Q-learning and SARSA (state-action-reward-stateaction) reinforcement learning, respectively, in power control.…”

Section: Related Workmentioning

confidence: 99%

A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access

Zhong

Gursoy

et al. 2019

To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework's tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework.Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.

2019 IEEE Globecom Workshops (GC Wkshps)

“…To address such a question, we adopt a reinforcement learning approach to learn the channel selection probabilities of a SU. Reinforcement learning (see, e.g., the book [13] and the recent survey [14]) is a field of machine learning that addresses the problems of how to behave in an environment by performing certain actions and observing the reward from those actions. In these problems, the fixed limited resources must be allocated to maximize their expected gain.…”

Section: Introductionmentioning

confidence: 99%

A Reinforcement Learning Approach for the Multichannel Rendezvous Problem

Wang

Chang

et al. 2019

In this paper, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hopping on the same channel have a successful rendezvous is a function of channel states. The channel states are modelled by two-state Markov chains that have a good state and a bad state. These channel states are not observable by the users. For such a multichannel rendezvous problem, we are interested in finding the optimal policy to minimize the expected time-to-rendezvous (ETTR) among the class of dynamic blind rendezvous policies, i.e., at the t th time slot each user selects channel i independently with probability pi(t), i = 1, 2, . . . , N . By formulating such a multichannel rendezvous problem as an adversarial bandit problem, we propose using a reinforcement learning approach to learn the channel selection probabilities pi(t), i = 1, 2, . . . , N . Our experimental results show that the reinforcement learning approach is very effective and yields comparable ETTRs when comparing to various approximation policies in the literature.