Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Vakili, Sattar; Liu, Keqin; Zhao, Qing

doi:10.1109/jstsp.2013.2263494

Cited by 80 publications

(77 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the difference between the pseudo-regret defined in [11] and regret in its original definition is in the order of O( √ T ). They have also shown that a variation of the DSEE policy developed in [15] for risk-neutral MAB achieves O(T 2/3 ) regret performance without the positive difference assumption. While [11] only focuses on policy development, this paper also provides tight lower bounds on both the asymptotic and finite-time regret performance which serve as fundamental limits for gauging the optimality of learning policies.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Mean-variance and value at risk in multi-armed bandit problems

Vakili

Zhao

2015

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Self Cite

View full text Add to dashboard Cite

We study risk-averse multi-armed bandit problems under different risk measures. We consider three risk mitigation models. In the first model, the variations in the reward values obtained at different times are considered as risk and the objective is to minimize the mean-variance of the observed rewards. In the second and the third models, the quantity of interest is the total reward at the end of the time horizon, and the objective is to minimize the mean-variance and maximize the value at risk of the total reward, respectively. We develop risk-averse online learning policies and analyze their regret performance. We also provide tight lower bounds on regret under the model of mean-variance of observations.

show abstract

Section: Related Workmentioning

confidence: 99%

“…A variation of the DSEE policy developed in [15] for risk-neutral MAB was considered in [11] and was shown to achieve O(T 2/3 ) finite-time regret performance. In the MV-DSEE policy, time is divided into two interleaving sequences: an exploration sequence denoted by E(t) and an exploitation sequence.…”

Section: B Risk-averse Learning Policiesmentioning

confidence: 99%

Mean-variance and value at risk in multi-armed bandit problems

Vakili

Zhao

2015

2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [18] the authors presented multiuser based spectrum access model. When multi users are presented in network it induces Collison.…”

Section: Introductionmentioning

confidence: 99%

“…When multi users are presented in network it induces Collison. To address this in [18] the presented an adaptive random access model and in [19] authors presented fair access model and [20] authors presented priority access model to reduce the collision among cognitive users. It is seen from exiting research [21] [22] that most of these schemes are limited to provide only one channel at a time to a cognitive user.…”

Section: Introductionmentioning

confidence: 99%

A Dynamic Spectrum Access Optimization Model for Cognitive Radio Wireless Sensor Network

Saroja¹,

Ragha²,

Sharma³

2017

IJCT

View full text Add to dashboard Cite

The availability of low cost and tiny sensor devices have resulted in increased adoption of wireless sensor network (WSN) in various industries and organization. The WSN is expected to play a significant role in future internet based application services. WSN has been adopted in healthcare, disaster management, environment monitoring and so on. The low-cost availability of smart devices has led to increased use of wireless devices such as Bluetooth, Wi-Fi etc. Therefore, cognitive radio network plays a significant role in handling spectrum efficiently. The emerging internet access technology such as 4G and 5G network which is expected to come in near future is going to make cognitive spectrum access more challenging. The existing cognitive radio based WSN is not efficient in utilizing spectrum. They induce high collision due to interference and improper channel state information. To address, this work present an efficient distributed opportunistic spectrum access for wireless sensor network. The channel availability of likelihood distribution is computed using continuous-time Markov chain considering primary transmitting users temporal channel usage channel pattern and spatial distribution. The simulation outcome shows the proposed model achieves significant performance improvement over existing model. The proposed model improves the overall spectrum efficiency of cognitive radio wireless sensor network in terms of throughput, packet transmission and collision.

show abstract

“…For learning the shortest path, we can simply treat each path as an arm and directly apply existing MAB policies developed in [2][3][4][5][6]. This approach, however, results in a regret growing linearly with the number of paths, thus exponentially with the number of edges in the worst case.…”

Section: Introductionmentioning

confidence: 99%

Online learning for network optimization under unknown models

Zhai

Zhao

2013

2013 IEEE Global Conference on Signal and Information Processing

View full text Add to dashboard Cite

We consider the shortest path problem in a communication network with random link costs drawn from unknown distributions. A realization of the total end-to-end cost is obtained when a path is selected for communication. The objective is an online learning algorithm that minimizes the total expected communication cost in the long run. The problem is formulated as a multi-armed bandit problem with dependent arms, and an algorithm based on basis-based learning integrated with a Best Linear Unbiased Estimator (BLUE) is developed.Index Terms-Bandit problem, shortest path, best linear unbiased estimator.

show abstract

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

Cited by 80 publications

References 27 publications

Mean-variance and value at risk in multi-armed bandit problems

Mean-variance and value at risk in multi-armed bandit problems

A Dynamic Spectrum Access Optimization Model for Cognitive Radio Wireless Sensor Network

Online learning for network optimization under unknown models

Contact Info

Product

Resources

About