Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 2018
DOI: 10.1145/3178876.3186039
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Mechanism Design for e-commerce

Abstract: We study the problem of allocating impressions to sellers in ecommerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform. We employ a general framework of reinforcement mechanism design, which uses deep reinforcement learning to design efficient algorithms, taking the strategic behaviour of the sellers into account. Specifically, we model the impression allocation problem as a Markov decision process, where the states encode the history of impressions, pri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 70 publications
(65 citation statements)
references
References 33 publications
(42 reference statements)
0
63
0
Order By: Relevance
“…Since the bidding strategies optimization in online advertising could be modeled as a sequential decision problem, several works utilized RL methods to solve it. Cai et al [4] formulated the impression allocation problem as an MDP and solved it by an actor-critic policy gradient algorithm based on DDPG. Cai et al [3] formulated a Markov Decision Process framework to learn sequential allocation of campaign budgets.…”
Section: Rl Methods For Bidding Strategiesmentioning
confidence: 99%
“…Since the bidding strategies optimization in online advertising could be modeled as a sequential decision problem, several works utilized RL methods to solve it. Cai et al [4] formulated the impression allocation problem as an MDP and solved it by an actor-critic policy gradient algorithm based on DDPG. Cai et al [3] formulated a Markov Decision Process framework to learn sequential allocation of campaign budgets.…”
Section: Rl Methods For Bidding Strategiesmentioning
confidence: 99%
“…But because π is not permutation invariant, we find a policy π * (P(c)) = π ((P * P T P)(c)) that is permutation invariant, where P * = arg max P ∈ P R(P(c) π (P (c)) ), then R(c π * (c) ) = R(P * (c) π (P * (c)) ) > 1 |P | P ∈ P R(P(c) π (P (c)) ), (8) which leads to a contradictory to (6) and (7). So it must be that Lemma 1 holds.…”
Section: Definition 1 (Permutation Invariant Policy)mentioning
confidence: 99%
“…The state is then transitioned into the next state. Such a model is tailored for a wide range of important realistic applications such as personalized recommender systems where users' preferences are regarded as states and items are regarded as items with contexts [20,26], and e-commerce where the private information (e.g., cost, reputation) of sellers can be viewed as states and different commercial strategies are regarded as contexts [7].…”
Section: Introductionmentioning
confidence: 99%
“…The previous name is: Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. successful applications of DRL techniques to optimize the decisionmaking process in E-commerce from different aspects including online recommendation [11], impression allocation [10,41], advertising bidding strategies [19,37,40] and product ranking [16].…”
Section: Introductionmentioning
confidence: 99%