2008
DOI: 10.1239/aap/1214950209
|View full text |Cite
|
Sign up to set email alerts
|

Index policies for discounted bandit problems with availability constraints

Abstract: Multi-armed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied.Then it is assumed that arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both probl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 26 publications
0
9
0
Order By: Relevance
“…We next show the indexability and compute the closed form expression for the index of a single-armed rested bandit. The proof of index computation is along the lines of [13].…”
Section: Remarkmentioning
confidence: 99%
See 3 more Smart Citations
“…We next show the indexability and compute the closed form expression for the index of a single-armed rested bandit. The proof of index computation is along the lines of [13].…”
Section: Remarkmentioning
confidence: 99%
“…For example, in a machine-repair problem one may not able to schedule a task on some of the machines due to machine breakdown. Such consideration has been made in [13]. In queuing systems, the controller may not be able to schedule jobs to some servers due to server breakdown, [14], [15].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, at any stage, the decision maker chooses only one ann, and there is no passive reward for the anns which are not chosen. Dayanik et al (2007) have proved that when the passive rewards are equal to zero, the Whittle's index converges to Gitlin's index. In fact, for the optimal selection of obsolescence strategies, since only one strategy can be chosen at any period and passive anns carry no reward; there is no difference between Whittle's and Gitlin's indices.…”
Section: Bandit Process Approach For Optimal Selection Of Obsolescencmentioning
confidence: 99%