Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising

Wu, Di; Chen, Xiujun; Yang, Xun; Wang, Hao; Tan, Qing; Zhang, Xiaoxun; Xu, Jian; Gai, Kun

doi:10.1145/3269206.3271748

Cited by 90 publications

(101 citation statements)

References 32 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The bid optimization problem is a very actively studied problem in real-time bidding [18,23,36,38], and several formulations and algorithms have been proposed in the display advertising scenario. Authors of work [6,29,35] proposed models to maximize advertising value within the budget, where the KPI constraint is not considered. Some work has been proposed to specifically address the KPI constraint such as [10,34].…”

Section: Related Workmentioning

confidence: 99%

Bid Optimization by Multivariable Control in Display Advertising

Yang

Wang

et al. 2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

Real-Time Bidding (RTB) is an important paradigm in display advertising, where advertisers utilize extended information and algorithms served by Demand Side Platforms (DSPs) to improve advertising performance. A common problem for DSPs is to help advertisers gain as much value as possible with budget constraints. However, advertisers would routinely add certain key performance indicator (KPI) constraints that the advertising campaign must meet due to practical reasons. In this paper, we study the common case where advertisers aim to maximize the quantity of conversions, and set cost-per-click (CPC) as a KPI constraint. We convert such a problem into a linear programming problem and leverage the primal-dual method to derive the optimal bidding strategy. To address the applicability issue, we propose a feedback control-based solution and devise the multivariable control system. The empirical study based on real-word data from Taobao.com verifies the effectiveness and superiority of our approach compared with the state of the art in the industry practices.

show abstract

Section: Related Workmentioning

confidence: 99%

Bid Optimization by Multivariable Control in Display Advertising

Yang

Wang

et al. 2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

Self Cite

View full text Add to dashboard Cite

show abstract

“…In our case, however we have to adjust the scores (which are continuous variables) of hundreds of millions of items to maximize the total reward, which drives us to model the state transitions explicitly. The other reasons for adopting RL instead of contextual bandit are as follows: 1) Wu et al [37] show that modeling trajectory constraints into RL will lead to higher profits since RL can naturally track the changes of constraints in a long run and make longer term decisions. 2), Hu et al [16] further confirm that RL methods can bring higher long-term returns than contextual bandit methods for ranking recommended products in e-commerce.…”

Section: Problem Formulationmentioning

confidence: 99%

“…The previous name is: Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. successful applications of DRL techniques to optimize the decisionmaking process in E-commerce from different aspects including online recommendation [11], impression allocation [10,41], advertising bidding strategies [19,37,40] and product ranking [16].…”

Section: Introductionmentioning

confidence: 99%

Learning Adaptive Display Exposure for Real-Time Advertising

Wang

Jin

Hao

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: requestlevel constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.

show abstract

“…The general bidding problem with nonstationary stochastic volume and partially observed market is a complex Reinforcement Learning (RL) problem tackled in [Wu et al 2018] using tools from the deep reinforcement learning literature. [Wu et al 2018] uses, as is done in this paper, the common approach of bidding proportionally to the predicted KPI probability and solves a control problem over this proportionality factor every few minutes instead of optimizing for every impression. It makes the approach practical for real uses.…”

Section: Related Papersmentioning

confidence: 99%

“…It makes the approach practical for real uses. [Wu et al 2018] finds the use of immediate reward misleading during the training, pushing to solutions neglecting the budget constraint. The approach proposed in this paper introduces budget constraints in the reward by simply adding a linear penalty.…”

Section: Related Papersmentioning

confidence: 99%

Recurrent Neural Networks for Stochastic Control in Real-Time Bidding

Grislain¹,

Perrin²,

Thabault³

2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Bidding in real-time auctions can be a difficult stochastic control task; especially if underdelivery incurs strong penalties and the market is very uncertain.Most current works and implementations focus on optimally delivering a campaign given a reasonable forecast of the market. Practical implementations have a feedback loop to adjust and be robust to forecasting errors, but no implementation, to the best of our knowledge, uses a model of market risk and actively anticipates market shifts.Solving such stochastic control problems in practice is actually very challenging. This paper proposes an approximate solution based on a Recurrent Neural Network (RNN) architecture that is both effective and practical for implementation in a production environment. The RNN bidder provisions everything it needs to avoid missing its goal. It also deliberately falls short of its goal when buying the missing impressions would cost more than the penalty for not reaching it.

show abstract

Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising

Cited by 90 publications

References 32 publications

Bid Optimization by Multivariable Control in Display Advertising

Bid Optimization by Multivariable Control in Display Advertising

Learning Adaptive Display Exposure for Real-Time Advertising

Recurrent Neural Networks for Stochastic Control in Real-Time Bidding

Contact Info

Product

Resources

About