Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2019
DOI: 10.1145/3292500.3330668
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning to Optimize Long-term User Engagement in Recommender Systems

Abstract: Recommender systems play a crucial role in our daily lives. Feed streaming mechanism has been widely used in the recommender system, especially on the mobile Apps. The feed streaming setting provides users the interactive manner of recommendation in never-ending feeds. In such a manner, a good recommender system should pay more attention to user stickiness, which is far beyond classical instant metrics and typically measured by long-term user engagement. Directly optimizing long-term user engagement is a non-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
120
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 178 publications
(121 citation statements)
references
References 33 publications
0
120
0
Order By: Relevance
“…In this work, the exploration policy is parameterized with multichannel stacked self-attention neural networks, which separately capture the information of versatile user behaviors since different rewarding recommendations for a specific user are usually extremely imbalanced (e.g., liking items usually are much fewer than disliking items) [18,45,48]. In Figure 2, we presented the neural architecture for exploration policy, which consists of an embedding layer, self-attentive blocks, and a policy layer.…”
Section: Self-attentive Neural Policymentioning
confidence: 99%
“…In this work, the exploration policy is parameterized with multichannel stacked self-attention neural networks, which separately capture the information of versatile user behaviors since different rewarding recommendations for a specific user are usually extremely imbalanced (e.g., liking items usually are much fewer than disliking items) [18,45,48]. In Figure 2, we presented the neural architecture for exploration policy, which consists of an embedding layer, self-attentive blocks, and a policy layer.…”
Section: Self-attentive Neural Policymentioning
confidence: 99%
“…DRN updates periodically after obtaining long term-reward such as return time [29]. An algorithm is proposed to use two individual LSTM modules for items with short-term and long-term rewards respectively [30]. The diversity of recommended sets is added to the reward function [15].…”
Section: Related Workmentioning
confidence: 99%
“…Recommender Systems, which aim to recommend potentially interested items for users and solve the information explosion problem, are playing critical roles in E-commerce sites (e.g., Amazon, JD.com, Alibaba) [10,20,41,42], videos sharing sites (e.g., YouTube) [7], picture sharing sites (e.g., Pinterest) [36], social networks (e.g., Facebook) [11,13] and so on. For example, in JD.com, one of the largest E-commerce sites in the world, the Recommender System serves more than 0.3 billion users in China, Thailand, Malaysia and other countries, and contributes billions of dollars for the Gross Merchandise Volume (GMV) (i.e., the total sales value for merchandise sold) each year.…”
Section: Introductionmentioning
confidence: 99%
“…Many ranking methods for recommendations have been proposed, including tree-based methods [9], deep neural networks [5,7,10,40,41], reinforcement learning [42,43] based models and so on. However, designing a real-world large-scale E-commerce recommender system still faces many challenges, including:…”
Section: Introductionmentioning
confidence: 99%