An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem

Gokcesu, Kaan; Kozat, Süleyman S.

doi:10.1109/tnnls.2018.2806006

Cited by 23 publications

(27 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In RL, an agent discovers the best action (i.e., height) which yields the most reward (i.e., average cell throughput) through a process of trial and error. With the uniform user distribution and ring-based approximation elaborated in Figure 3, this scenario perfectly aligns with a markov decision process (MDP) with a single state (i.e., stationary environment) which can be optimally handled with RL-based multi-armed bandit (MAB) problem [41]. The aim of MAB is to develop a learning policy that achieves maximal cumulative reward.…”

Section: Abs Height Optimization Using Reinforcement Learningmentioning

confidence: 95%

Multi-Tier Variable Height UAV Networks: User Coverage and Throughput Optimization

Nafees¹,

Thompson

Safari

2021

IEEE Access

View full text Add to dashboard Cite

Unmanned aerial vehicles (UAVs) are increasingly considered to act as base stations (BSs) for the future wireless networks. Some of the crucial UAV-assisted network design challenges are the network coverage, throughput, and energy efficiency. Therefore, fast, low-complexity, and efficient UAV placement and resource allocation strategies are imperative. This paper presents a novel variable height multi-UAV deployment strategy to exploit the 3D flexibility of UAVs as BSs. We propose a multi-tier variable height UAV-based network deployment and compare its performance with the state-of-the-art equal height deployment. Height optimization is performed to deliver energy efficiency and throughput maximization for each cell. The results show that our proposed method is more energy-efficient in a multicell UAV network than the most widely used height optimization method in the literature. In UAV networks, users at the cell edges can receive very poor signal-to-interference-plus-noise ratio (SINR) levels due to interfering UAVs. To cope with this problem, we adopt a fractional frequency reuse (FFR) scheme to compensate low SINR levels. We optimize the SINR threshold corresponding to each cell to maximize their spectral efficiency (SE), thereby improving the network's area spectral efficiency (ASE). The numerical results show that the proposed deployments provide significant gains in coverage density, SINR coverage probability, rate coverage, and ASE compared to equal height benchmark scheme. As the number of UAVs increases, the number of tiers need to increase to preserve the rate coverage of the network. Moreover, the performance of the proposed variable height model is expected to converge to that of equal height cellular design for a large number of UAVs.

show abstract

Section: Abs Height Optimization Using Reinforcement Learningmentioning

confidence: 95%

Multi-Tier Variable Height UAV Networks: User Coverage and Throughput Optimization

Nafees¹,

Thompson

Safari

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Over the past years, the global optimization problem has gathered significant attention with various algorithms being proposed in distinct fields of research. It has been studied especially in the fields of non-convex optimization [6]- [8], Bayesian optimization [9], convex optimization [10]- [12], bandit optimization [13], stochastic optimization [14], [15]; because of its practical applications in distribution estimation [16]- [19], multi-armed bandits [20]- [22], control theory [23], signal processing [24], game theory [25], prediction [26], [27], decision theory [28] and anomaly detection [29]- [31].…”

Section: A Motivationmentioning

confidence: 99%

Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions

Gokcesu¹,

Gokcesu²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

In this work, we propose a computationally efficient algorithm for the problem of global optimization in univariate loss functions. For the performance evaluation, we study the cumulative regret of the algorithm instead of the simple regret between our best query and the optimal value of the objective function. Although our approach has similar regret results with the traditional lower-bounding algorithms such as the Piyavskii-Shubert method for the Lipschitz continuous or Lipschitz smooth functions, it has a major computational cost advantage. In Piyavskii-Shubert method, for certain types of functions, the query points may be hard to determine (as they are solutions to additional optimization problems). However, this issue is circumvented in our binary sampling approach, where the sampling set is predetermined irrespective of the function characteristics. For a search space of [0, 1], our approach has at most L log(3T ) and 2.25H regret for L-Lipschitz continuous and H-Lipschitz smooth functions respectively. We also analytically extend our results for a broader class of functions that covers more complex regularity conditions.

show abstract

“…In these types of applications, we encounter the fundamental dilemma of exploration-exploitation trade-off, which is most throughly studied in the multi-armed bandit problem [43]. To that end, study of the multi-armed bandit problem has received considerable attention over the years [32], [34], [35], [37], [39], [43]- [45], where the goal is to minimize or maximize some loss or reward, respectively, in a problem environment by sequentially selecting one of M given actions [46].…”

Section: A Preliminariesmentioning

confidence: 99%

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Gokcesu¹,

Gokcesu²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We study the adversarial multi-armed bandit problem and create a completely online algorithmic framework that is invariant under arbitrary translations and scales of the arm losses. We study the expected performance of our algorithm against a generic competition class, which makes it applicable for a wide variety of problem scenarios. Our algorithm works from a universal prediction perspective and the performance measure used is the expected regret against arbitrary arm selection sequences, which is the difference between our losses and a competing loss sequence. The competition class can be designed to include fixed arm selections, switching bandits, contextual bandits, or any other competition of interest. The sequences in the competition class are generally determined by the specific application at hand and should be designed accordingly. Our algorithm neither uses nor needs any preliminary information about the loss sequences and is completely online. Its performance bounds are the second order bounds in terms of sum of the squared losses, where any affine transform of the losses has no effect on the normalized regret.

show abstract

An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem

Cited by 23 publications

References 42 publications

Multi-Tier Variable Height UAV Networks: User Coverage and Throughput Optimization

Multi-Tier Variable Height UAV Networks: User Coverage and Throughput Optimization

Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions

Generalized Translation and Scale Invariant Online Algorithm for Adversarial Multi-Armed Bandits

Contact Info

Product

Resources

About