Index policies for discounted bandit problems with availability constraints

Dayanık, Savaş; Powell, Warren B.; Yamazaki, Kazutoshi

doi:10.1239/aap/1214950209

Cited by 15 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We next show the indexability and compute the closed form expression for the index of a single-armed rested bandit. The proof of index computation is along the lines of [13].…”

Section: Remarkmentioning

confidence: 99%

“…For example, in a machine-repair problem one may not able to schedule a task on some of the machines due to machine breakdown. Such consideration has been made in [13]. In queuing systems, the controller may not be able to schedule jobs to some servers due to server breakdown, [14], [15].…”

Section: Introductionmentioning

confidence: 99%

“…In constrained bandits [13], [14], [15], each state is defined as a pair (X(t), Y (t)), where X(t) represents the state of arm and Y (t) represents availability of an arm at time t. Time is discretized in [13] while it is continuous in the models of [14], [15]. The state (X(t), Y (t)) is assumed to be observable.…”

Section: Introductionmentioning

confidence: 99%

“…The state (X(t), Y (t)) is assumed to be observable. Under some assumptions on model parameters the index policy is analyzed in [13], [14], [15]. In this paper we consider a hidden Markov model, where state X(t) of the arm is not observable but the availability of the arm is observable.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Multi-armed Bandits with Constrained Arms and Hidden States

Mehta,

Meshram,

Kaza

et al. 2017

Preprint

View full text Add to dashboard Cite

The problem of rested and restless multi-armed bandits with constrained availability of arms is considered. The states of arms evolve in Markovian manner and the exact states are hidden from the decision maker. First, some structural results on value functions are claimed. Following these results, the optimal policy turns out to be a threshold policy. Further, indexability of rested bandits is established and index formula is derived. The performance of index policy is illustrated and compared with myopic policy using numerical examples.

show abstract

“…We next show the indexability and compute the closed form expression for the index of a single-armed rested bandit. The proof of index computation is along the lines of [13].…”

Section: Remarkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-armed Bandits with Constrained Arms and Hidden States

Mehta,

Meshram,

Kaza

et al. 2017

Preprint

View full text Add to dashboard Cite

show abstract

“…However, at any stage, the decision maker chooses only one ann, and there is no passive reward for the anns which are not chosen. Dayanik et al (2007) have proved that when the passive rewards are equal to zero, the Whittle's index converges to Gitlin's index. In fact, for the optimal selection of obsolescence strategies, since only one strategy can be chosen at any period and passive anns carry no reward; there is no difference between Whittle's and Gitlin's indices.…”

Section: Bandit Process Approach For Optimal Selection Of Obsolescencmentioning

confidence: 99%