Reinforcement Learning for MDPs with Constraints

Geibel, Peter

doi:10.1007/11871842_63

Cited by 82 publications

(75 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several prior works have tried to formalize the online advertising problem as a reinforcement learning framework. In [11,18], the authors fit the banner delivery and the ad allocation problems into the MAB model while the rewards are the number of ad clicks and the profits. However, these prior works, assume no cost for showing the impressions and thus consider no constraints.…”

Section: Related Workmentioning

confidence: 99%

“…We consider the predicted clickthrough-rate (CTR 4 ) as the state, the number of clicks as the reward to maximize, the market price as the cost, and the budget limit as the constraint. We integrate the optimization problem and the condition of budget limit into the model and use the linear programming method [11] to solve the CMDP. The policy derived from the solution gives an optimal bid price for each state.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving Real-Time Bidding Using a Constrained Markov Decision Process

Sassioui

Varisteas

et al. 2017

Advanced Data Mining and Applications

View full text Add to dashboard Cite

Abstract. Online advertising is increasingly switching to real-time bidding on advertisement inventory, in which the ad slots are sold through real-time auctions upon users visiting websites or using mobile apps. To compete with unknown bidders in such a highly stochastic environment, each bidder is required to estimate the value of each impression and to set a competitive bid price. Previous bidding algorithms have done so without considering the constraint of budget limits, which we address in this paper. We model the bidding process as a Constrained Markov Decision Process based reinforcement learning framework. Our model uses the predicted click-through-rate as the state, bid price as the action, and ad clicks as the reward. We propose a bidding function, which outperforms the state-of-the-art bidding functions in terms of the number of clicks when the budget limit is low. We further simulate different bidding functions competing in the same environment and report the performances of the bidding strategies when required to adapt to a dynamic environment.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Improving Real-Time Bidding Using a Constrained Markov Decision Process

Sassioui

Varisteas

et al. 2017

Advanced Data Mining and Applications

View full text Add to dashboard Cite

show abstract

“…However, in Bayes' nets the constraints only provide information about the immediate action whereas in MDPs, the policies are sequential in nature and need to account for possible future plans. Constrained reinforcement learning [7] and constrained MDPs [4] have been proposed to handle multi-objective scenarios, but the constraints in these cases are often of the form which limit the value of a policy. In our case, the constraints that arise from expert feedback are imposed on the Q function instead of the policy which makes the problem non-convex and harder to solve.…”

Section: Related Workmentioning

confidence: 99%

Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

Khan

Poupart

Agosta

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. In this paper, we present a method to iteratively refine the parameters of a Markov Decision Process by leveraging constraints implied from an expert's review of the policy. We impose a constraint on the parameters of the model for every case where the expert's recommendation differs from the recommendation of the policy. We demonstrate that consistency with an expert's feedback leads to non-convex constraints on the model parameters. We refine the parameters of the model, under these constraints, by partitioning the parameter space and iteratively applying alternating optimization. We demonstrate how the approach can be applied to both flat and factored MDPs and present results based on diagnostic sessions from a manufacturing scenario.

show abstract

“…Besides, statistical uncertainty consideration is similar to, but strictly demarcated from other issues that deal with uncertainty and risk consideration. Consider the work of Heger (1994) and of Geibel (2001). They deal with risk in the context of undesirable states.…”

Section: Related Workmentioning

confidence: 99%