1997
DOI: 10.1287/moor.22.1.222
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Adaptive Policies for Markov Decision Processes

Abstract: In this paper we consider the problem of adaptive control for Markov Decision Processes. We give the explicit form for a class of adaptive policies that possess optimal increase rate properties for the total expected finite horizon reward, under sufficient assumptions of finite state-action spaces and irreducibility of the transition law. A main feature of the proposed policies is that the choice of actions, at each state and time period, is based on indices that are inflations of the right-hand side of the es… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
134
0
1

Year Published

2003
2003
2021
2021

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 190 publications
(137 citation statements)
references
References 27 publications
2
134
0
1
Order By: Relevance
“…In particular, assuming irreducibility of the transition matrices, an asymptotically logarithmic regret is possible (Burnetas and Katehakis, 1997;Tewari and Bartlett, 2008): L A (T ) ≤ C · ln T for some constant C. Unfortunately, the value of C is depends on the dynamics of the underlying MDP and in fact can be arbitrarily large.…”
Section: Regret Minimizationmentioning
confidence: 99%
“…In particular, assuming irreducibility of the transition matrices, an asymptotically logarithmic regret is possible (Burnetas and Katehakis, 1997;Tewari and Bartlett, 2008): L A (T ) ≤ C · ln T for some constant C. Unfortunately, the value of C is depends on the dynamics of the underlying MDP and in fact can be arbitrarily large.…”
Section: Regret Minimizationmentioning
confidence: 99%
“…Katehakis and Robbins [32], Burnetas et al [7], Burnetas and Katehakis [8], Ortner and Auer [40], Oksanen et al [39]. For other related work we refer to the following: Flint et al [17], Fernández-Gaucherand et al [15], Govindarajulu and Katehakis [25], Honda and Takemura [26], Tekin and Liu [47], Tewari and Bartlett [48], Filippi et al [16], Bertsekas [4], Bubeck and Cesa-Bianchi [5] and Burnetas et al [6].…”
Section: Introductionmentioning
confidence: 99%
“…These papers have a different objective than ours as they focus on minimizing regret by constructing adaptive index policies that possess optimal increase rate properties. This approach has been extended to finite state and action MDPs with incomplete information (Burnetas and Katehakis [6]) and to adversarial bandits that either make no assumption whatsoever on the process generating the payoffs of the bandits (Auer et al [1]) or bound its variation within a "variation budget" (Besbes et al [4]). At the time of submission we became aware of the work by Kim and Lim [18] that also study the RMAB problem but with an alternative formulation in which deviations of the transition probabilities from their point estimates are penalized, so the analysis is essentially different from ours.…”
Section: Introductionmentioning
confidence: 99%