Artificial Intelligence (AI)-enabled radios are expected to enhance the spectral efficiency of 5th generation (5G) millimeter wave (mmWave) networks by learning to optimize network resources. However, allocating resources over the mmWave band is extremely challenging due to rapidlyvarying channel conditions. We consider several resource allocation problems for mmWave radio networks under unknown channel statistics and without any channel state information (CSI) feedback: i) dynamic rate selection for an energy harvesting transmitter, ii) dynamic power allocation for heterogeneous applications, and iii) distributed resource allocation in a multiuser network. All of these problems exhibit structured payoffs which are unimodal functions over partially ordered arms (transmission parameters) as well as over partially ordered contexts (side-information). Unimodality over arms helps in reducing the number of arms to be explored, while unimodality over contexts helps in using past information from nearby contexts to make better selections. We model this as a structured reinforcement learning problem, called contextual unimodal multi-armed bandit (MAB), and propose an online learning algorithm that exploits unimodality to optimize the resource allocation over time, and prove that it achieves logarithmic in time regret. Our algorithm's regret scales sublinearly both in the number of arms and contexts for a wide range of scenarios. We also show via simulations that our algorithm significantly improves the performance in the aforementioned resource allocation problems.
We consider dynamic rate and channel adaptation in a cognitive radio network serving heterogeneous applications under dynamically varying channel availability and rate constraint. We formalize it as a Bayesian learning problem, and propose a novel learning algorithm, called Volatile Constrained Thompson Sampling (V-CoTS), which considers each rate-channel pair as a two-dimensional action. The set of available actions varies dynamically over time due to variations in primary user activity and rate requirements of the applications served by the users. Our algorithm learns to adapt its rate and opportunistically exploit spectrum holes when the channel conditions are unknown and channel state information is absent, by using acknowledgment only feedback. It uses the monotonicity of the transmission success probability in the transmission rate to optimally tradeoff exploration and exploitation of the actions. Numerical results demonstrate that V-CoTS achieves significant gains in throughput compared to the state-of-the-art methods.
We consider the problem of dynamic rate selection in a cognitive radio network (CRN) over the millimeter wave (mmWave) spectrum. Specifically, we focus on the scenario when the transmit power is time varying as motivated by the following applications: i) an energy harvesting CRN, in which the system solely relies on the harvested energy source, and ii) an underlay CRN, in which a secondary user (SU) restricts its transmission power based on a dynamically changing interference temperature limit (ITL) such that the primary user (PU) remains unharmed. Since the channel quality fluctuates very rapidly in mmWave networks and costly channel state information (CSI) is not that useful, we consider rate adaptation over an mmWave channel as an online stochastic optimization problem, and propose a Thompson Sampling (TS) based Bayesian method. Our method utilizes the unimodality and monotonicity of the throughput with respect to rates and transmit powers and achieves logarithmic in time regret with a leading term that is independent of the number of available rates. Our regret bound holds for any sequence of transmits powers and captures the dependence of the regret on the arrival pattern. We also show via simulations that the performance of the proposed algorithm is superior than the stateof-the-art algorithms, especially when the arrivals are favorable.
We consider the problem of distributed dynamic rate and channel selection in a multi-user network, in which each user selects a wireless channel and a modulation and coding scheme (corresponds to a transmission rate) in order to maximize the network throughput. We assume that the users are cooperative, however, there is no coordination and communication among them, and the number of users in the system is unknown. We formulate this problem as a multi-player multi-armed bandit problem and propose a decentralized learning algorithm that performs almost optimal exploration of the transmission rates to learn fast. We prove that the regret of our learning algorithm with respect to the optimal allocation increases logarithmically over rounds with a leading term that is logarithmic in the number of transmission rates. Finally, we compare the performance of our learning algorithm with the state-of-the-art via simulations and show that it substantially improves the throughput and minimizes the number of collisions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.