2021
DOI: 10.48550/arxiv.2110.07923
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Value Penalized Q-Learning for Recommender Systems

Chengqian Gao,
Ke Xu,
Peilin Zhao

Abstract: Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customer's long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data. However, the high-dimensional action space and the non-stationary dynamics in commercial RS intensify distributional shift issues, making it challenging to apply offline RL methods to RS. To alleviate the ac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 27 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?