Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2018
DOI: 10.1145/3269206.3271748
|View full text |Cite
|
Sign up to set email alerts
|

Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising

Abstract: Real-time bidding (RTB) is an important mechanism in online display advertising, where a proper bid for each page view plays an essential role for good marketing results. Budget constrained bidding is a typical scenario in RTB where the advertisers hope to maximize the total value of the winning impressions under a pre-set budget constraint. However, the optimal bidding strategy is hard to be derived due to the complexity and volatility of the auction environment. To address these challenges, in this paper, we… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
97
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 90 publications
(101 citation statements)
references
References 32 publications
(42 reference statements)
0
97
0
Order By: Relevance
“…The bid optimization problem is a very actively studied problem in real-time bidding [18,23,36,38], and several formulations and algorithms have been proposed in the display advertising scenario. Authors of work [6,29,35] proposed models to maximize advertising value within the budget, where the KPI constraint is not considered. Some work has been proposed to specifically address the KPI constraint such as [10,34].…”
Section: Related Workmentioning
confidence: 99%
“…The bid optimization problem is a very actively studied problem in real-time bidding [18,23,36,38], and several formulations and algorithms have been proposed in the display advertising scenario. Authors of work [6,29,35] proposed models to maximize advertising value within the budget, where the KPI constraint is not considered. Some work has been proposed to specifically address the KPI constraint such as [10,34].…”
Section: Related Workmentioning
confidence: 99%
“…In our case, however we have to adjust the scores (which are continuous variables) of hundreds of millions of items to maximize the total reward, which drives us to model the state transitions explicitly. The other reasons for adopting RL instead of contextual bandit are as follows: 1) Wu et al [37] show that modeling trajectory constraints into RL will lead to higher profits since RL can naturally track the changes of constraints in a long run and make longer term decisions. 2), Hu et al [16] further confirm that RL methods can bring higher long-term returns than contextual bandit methods for ranking recommended products in e-commerce.…”
Section: Problem Formulationmentioning
confidence: 99%
“…The previous name is: Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. successful applications of DRL techniques to optimize the decisionmaking process in E-commerce from different aspects including online recommendation [11], impression allocation [10,41], advertising bidding strategies [19,37,40] and product ranking [16].…”
Section: Introductionmentioning
confidence: 99%
“…The general bidding problem with nonstationary stochastic volume and partially observed market is a complex Reinforcement Learning (RL) problem tackled in [Wu et al 2018] using tools from the deep reinforcement learning literature. [Wu et al 2018] uses, as is done in this paper, the common approach of bidding proportionally to the predicted KPI probability and solves a control problem over this proportionality factor every few minutes instead of optimizing for every impression. It makes the approach practical for real uses.…”
Section: Related Papersmentioning
confidence: 99%
“…It makes the approach practical for real uses. [Wu et al 2018] finds the use of immediate reward misleading during the training, pushing to solutions neglecting the budget constraint. The approach proposed in this paper introduces budget constraints in the reward by simply adding a linear penalty.…”
Section: Related Papersmentioning
confidence: 99%