2006
DOI: 10.1007/11871842_63
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning for MDPs with Constraints

Abstract: In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
74
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 82 publications
(75 citation statements)
references
References 4 publications
0
74
0
Order By: Relevance
“…Several prior works have tried to formalize the online advertising problem as a reinforcement learning framework. In [11,18], the authors fit the banner delivery and the ad allocation problems into the MAB model while the rewards are the number of ad clicks and the profits. However, these prior works, assume no cost for showing the impressions and thus consider no constraints.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Several prior works have tried to formalize the online advertising problem as a reinforcement learning framework. In [11,18], the authors fit the banner delivery and the ad allocation problems into the MAB model while the rewards are the number of ad clicks and the profits. However, these prior works, assume no cost for showing the impressions and thus consider no constraints.…”
Section: Related Workmentioning
confidence: 99%
“…We consider the predicted clickthrough-rate (CTR 4 ) as the state, the number of clicks as the reward to maximize, the market price as the cost, and the budget limit as the constraint. We integrate the optimization problem and the condition of budget limit into the model and use the linear programming method [11] to solve the CMDP. The policy derived from the solution gives an optimal bid price for each state.…”
Section: Introductionmentioning
confidence: 99%
“…However, in Bayes' nets the constraints only provide information about the immediate action whereas in MDPs, the policies are sequential in nature and need to account for possible future plans. Constrained reinforcement learning [7] and constrained MDPs [4] have been proposed to handle multi-objective scenarios, but the constraints in these cases are often of the form which limit the value of a policy. In our case, the constraints that arise from expert feedback are imposed on the Q function instead of the policy which makes the problem non-convex and harder to solve.…”
Section: Related Workmentioning
confidence: 99%
“…Besides, statistical uncertainty consideration is similar to, but strictly demarcated from other issues that deal with uncertainty and risk consideration. Consider the work of Heger (1994) and of Geibel (2001). They deal with risk in the context of undesirable states.…”
Section: Related Workmentioning
confidence: 99%