Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/360
|View full text |Cite
|
Sign up to set email alerts
|

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Abstract: Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term valu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
81
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 91 publications
(81 citation statements)
references
References 17 publications
0
81
0
Order By: Relevance
“…Other enhancements include incorporating contextual data [5]. Most recently, Chen et al [10] and Ie et al [23] showed success in applying reinforcement learning techniques in YouTube recommender systems. Our work does not deal with designing a recommender system, nor does it attempt to reverse engineer the YouTube recommender.…”
Section: Recommender Systems and Video Recommendationmentioning
confidence: 99%
See 1 more Smart Citation
“…Other enhancements include incorporating contextual data [5]. Most recently, Chen et al [10] and Ie et al [23] showed success in applying reinforcement learning techniques in YouTube recommender systems. Our work does not deal with designing a recommender system, nor does it attempt to reverse engineer the YouTube recommender.…”
Section: Recommender Systems and Video Recommendationmentioning
confidence: 99%
“…The first gap measures and estimates the effects of recommender systems in complex social systems. The main goals of recommender systems are maximizing the chance that a user clicks on an item in the next step [4,16,17,48] or in a longer time horizon [5,10,23]. However, recommendation in social systems remains as an open problem for two reasons: (1) a limited conceptual understanding of how finite human attention is allocated over the network of content, in which some items gain popularity at the expense of, or with the assistance of others; (2) the computational challenge of jointly recommending a large collection of items.…”
Section: Introductionmentioning
confidence: 99%
“…The current trend in this direction is to take into account complex user behaviours and knowledge graph information to achieve high efficiency with a large amount of data and large number of items [151]. The application of reinforcement learning techniques in industrial recommender systems is also prevalent, such as in YouTube [152] and Alibaba [153]. The development of deep reinforcement learning-based recommender systems will continue to be a hot area and will be more heavily driven by real-world industrial applications.…”
Section: Reinforcement Learning In Recommender Systemsmentioning
confidence: 99%
“…Methods given in [10], [15]- [19], [30] explicitly consider the impacts of the co-displayed items when generating recommendation lists. Some of them [10], [15]- [17] are re-ranking methods.…”
Section: Preliminariesmentioning
confidence: 99%
“…We independently develop a similar idea as [17], where the main difference is that we try to replace the original ranking mechanism in the system with new strategies to generate lists directly as [18]- [20]. The work in [18] is trying to optimize a simplified objective, and the one in [19] makes additional assumptions when solving the list recommendation task. For the method in [20], it is based on conditional variational auto-encoder (CVAE) [21].…”
Section: Introductionmentioning
confidence: 99%