Proceedings of the 12th ACM Conference on Recommender Systems 2018
DOI: 10.1145/3240323.3240374
|View full text |Cite
|
Sign up to set email alerts
|

Deep reinforcement learning for page-wise recommendations

Abstract: Recommender systems can mitigate the information overload problem by suggesting users' personalized items. In real-world recommendations such as e-commerce, a typical interaction between the system and its users is -users are recommended a page of items and provide feedback; and then the system recommends a new page of items. To effectively capture such interaction for recommendations, we need to solve two key problems -(1) how to update recommending strategy according to user's real-time feedback, and 2) how … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
203
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 314 publications
(217 citation statements)
references
References 31 publications
0
203
0
Order By: Relevance
“…On the other hand, although Wu et al [28] proposed to optimize the delayed revisiting time, there is no systematical solution to optimizing delayed metrics for user engagement. Apart from contextual bandits, a series of MDP based models [5,14,15,23,32,35,39] are proposed in recommendation task. Arnold et al [5] proposed a modified DDPG model to deal with the problem of large discrete action spaces.…”
Section: Reinforcement Learning Based Recommender Systemmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, although Wu et al [28] proposed to optimize the delayed revisiting time, there is no systematical solution to optimizing delayed metrics for user engagement. Apart from contextual bandits, a series of MDP based models [5,14,15,23,32,35,39] are proposed in recommendation task. Arnold et al [5] proposed a modified DDPG model to deal with the problem of large discrete action spaces.…”
Section: Reinforcement Learning Based Recommender Systemmentioning
confidence: 99%
“…Unfortunately, current methods including Monte Carlo (MC) and temporaldifference (TD) have limitations for offline policy learning in realistic recommender systems: MC-based methods suffer from the problem of high variance, especially when facing enormous action space (e.g., billions of candidate items) in real-world applications; TD-based methods improve the efficiency by using bootstrapping techniques in estimation, which, however, is confronted with another notorious problem called Deadly Triad (i.e., the problem of instability and divergence arises whenever combining function approximation, bootstrapping, and offline training [24]). Unfortunately, state-of-the-art methods [33,34] in recommender systems, which are designed with neural architectures, will encounter inevitably the Deadly Triad problem in offline policy learning.…”
Section: Introductionmentioning
confidence: 99%
“…Matrix factorization based algorithms [27,28] are widely used to tackle recommendation problems. Recently, recommendation algorithms achieve remarkable improvements during these years with help of deep learning models [12,14,30,47,49] and the successful introducing of side-information [15,19,22,23,34]. In this study, we focus on the introducing of side-information in the knowledge graph for the recommendation, and there already two types of studies using the knowledge graph in the recommendation: path-based and embedding learning based.…”
Section: Related Workmentioning
confidence: 99%
“…This is possible because (i) edge storage and compute resources are more powerful with various system-on-chip (SoC) technologies and (ii) there is a dataprivacy practice to keep personal data locally. Further, due to its inherent capability of adaptive modeling and longterm planning, reinforcement learning presents potential in building interactive and personalized models, such as interactive recommendation systems [111], [112], [113].…”
Section: Model Training and Deploymentmentioning
confidence: 99%