We consider a class of restless multi-armed bandit (RMAB) problems with unknown arm dynamics. At each time, a player chooses an arm out of N arms to play, referred to as an active arm, and receives a random reward from a finite set of reward states. The reward state of the active arm transits according to an unknown Markovian dynamics. The reward state of passive arms (which are not chosen to play at time t) evolves according to an arbitrary unknown random process. The objective is an arm-selection policy that minimizes the regret, defined as the reward loss with respect to a player that always plays the most rewarding arm. This class of RMAB problems has been studied recently in the context of communication networks and financial investment applications. We develop a strategy that selects arms to be played in a consecutive manner, dubbed Adaptive Sequencing Rules (ASR) algorithm. The sequencing rules for selecting arms under the ASR algorithm are adaptively updated and controlled by the current sample reward means. By designing judiciously the adaptive sequencing rules, we show that the ASR algorithm achieves a logarithmic regret order with time, and a finite-sample bound on the regret is established. Although existing methods have shown a logarithmic regret order with time in this RMAB setting, the theoretical analysis shows a significant improvement in the regret scaling with respect to the system parameters under ASR. Extensive simulation results support the theoretical study and demonstrate strong performance of the algorithm as compared to existing methods.
The problem of detecting anomalies in multiple processes is considered. We consider a composite hypothesis case, in which the measurements drawn when observing a process follow a common distribution with an unknown parameter (vector), whose value lies in normal or abnormal parameter spaces, depending on its state. The objective is a sequential search strategy that minimizes the expected detection time subject to an error probability constraint. We develop a deterministic search algorithm with the following desired properties. First, when no additional side information on the process states is known, the proposed algorithm is asymptotically optimal in terms of minimizing the detection delay as the error probability approaches zero. Second, when the parameter value under the null hypothesis is known and equal for all normal processes, the proposed algorithm is asymptotically optimal as well, with better detection time determined by the true null state. Third, when the parameter value under the null hypothesis is unknown, but is known to be equal for all normal processes, the proposed algorithm is consistent in terms of achieving error probability that decays to zero with the detection delay. Finally, an explicit upper bound on the error probability under the proposed algorithm is established for the Bar Hemo, Tomer Gafni and Kobi Cohen are with the
We consider the problem of multi-user spectrum access in wireless networks. The bandwidth is divided into K orthogonal channels, and M users aim to access the spectrum. Each user chooses a single channel for transmission at each time slot. The state of each channel is modeled by a restless unknown Markovian process. Previous studies have analyzed a special case of this setting, in which each channel yields the same expected rate for all users. By contrast, we consider a more general and practical model, where each channel yields a different expected rate for each user. This model adds a significant challenge of how to efficiently learn a channel allocation in a distributed manner to yield a global system-wide objective. We adopt the stable matching utility as the system objective, which is known to yield strong performance in multichannel wireless networks, and develop a novel Distributed Stable Strategy Learning (DSSL) algorithm to achieve the objective. We prove theoretically that DSSL converges to the stable matching allocation, and the regret, defined as the loss in total rate with respect to the stable matching solution, has a logarithmic order with time. Finally, simulation results demonstrate the strong performance of the DSSL algorithm. I. INTRODUCTIONWe consider the spectrum access problem, where a shared bandwidth is divided into K orthogonal channels (i.e., subbands), and M users want to access the spectrum, where K ≥ M . Each channel is modeled by a Finite-State Markovian Channel (FSMC), which is independent and non-identically distributed across channels. The FSMC is a tractable model widely used to capture the time-varying behavior of a radio communication channel [2], [3]. It is often employed to model radio channel dynamics due to primary user occupancy effects in hierarchical cognitive radio networks (where the M secondary (unlicensed) users are cognitive in terms of learning and adapting good access strategies), or the external interference effects in the open sharing model among M users in the wireless network (e.g., ISM band) [4], [5]. At each time step, each user experiences a different transmission rate over each channel depending on its FSMC distribution, where the FSMC parameters (i.e., the transition probabilities that govern the Markov chain) are unknown. At each time step, each user is allowed to choose one channel to access, and observe the instantaneous channel state. If two users or more access the same channel at the same time, a collision occurs and the achievable rate is zero.We adopt the stable matching utility (see Section II for details) as the system objective, which is known to yield strong Tomer Gafni and Kobi Cohen are with the School of Electrical and Computer Engineering,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.