State Encoders in Reinforcement Learning for Recommendation

Huang, Jin; Oosterhuis, Harrie; Çetinkaya, Bünyamin; Rood, Thijs; Rijke, Maarten de

doi:10.1145/3477495.3531716

Cited by 8 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It can be implemented as any sequential model such as recurrent neural network (RNN)-based models [44], Convolutional models [40,56], Transformer-based methods [12,25,51]. Huang et al [22] investigated the performances of different state encoders in RL-based recommenders. We use a naive average layer as the state tracker since it requires the least training time but nonetheless outperforms many complex encoders [22].…”

Section: The Dorl Methodsmentioning

confidence: 99%

“…Huang et al [22] investigated the performances of different state encoders in RL-based recommenders. We use a naive average layer as the state tracker since it requires the least training time but nonetheless outperforms many complex encoders [22]. It can be written as:…”

Section: The Dorl Methodsmentioning

confidence: 99%

“…These randomly exposed data can reflect users' unbiased preferences, from which we can complete the matrix to emulate the fully-observed matrix in KuaiRec. This is an effective way to evaluate RL-based recommendation [22,23]. We use the "is_click" signal to indicate users' ground-truth interest, i.e., as the online reward.…”

Section: Methodsmentioning

confidence: 99%

“…For example, the user model can be implemented as any state-ofthe-art recommendation model (e.g., DeepFM [19] in this work) or sophisticated generative adversarial frameworks [53,59]. Although some works have adopted this paradigm in their recommender systems [12,22,23,63], they did not explicitly consider the value overestimation problem in offline RL, not to mention the Matthew effect in the solutions.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Gao

Huang

Chen

et al. 2023

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Offline reinforcement learning (RL), a technology that offline learns a policy from logged data without the need to interact with online environments, has become a favorable choice in decision-making processes like interactive recommendation. Offline RL faces the value overestimation problem. To address it, existing methods employ conservatism, e.g., by constraining the learned policy to be close to behavior policies or punishing the rarely visited state-action pairs. However, when applying such offline RL to recommendation, it will cause a severe Matthew effect, i.e., the rich get richer and the poor get poorer, by promoting popular items or categories while suppressing the less popular ones. It is a notorious issue that needs to be addressed in practical recommender systems.In this paper, we aim to alleviate the Matthew effect in offline RL-based recommendation. Through theoretical analyses, we find that the conservatism of existing methods fails in pursuing users' long-term satisfaction. It inspires us to add a penalty term to relax the pessimism on states with high entropy of the logging policy and indirectly penalizes actions leading to less diverse states. This leads to the main technical contribution of the work: Debiased model-based Offline RL (DORL) method. Experiments show that DORL not only captures user interests well but also alleviates the Matthew effect. The implementation is available via https://github. com/chongminggao/DORL-codes. CCS CONCEPTS• Information systems → Recommender systems.

show abstract

Section: The Dorl Methodsmentioning

confidence: 99%

Section: The Dorl Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Gao

Huang

Chen

et al. 2023

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…Finally, we define the negative sampling and rewards that are suitable for this MMIR scenario (Section 3.3). [1,9,22,32]. In this scenario, the users' interactions with the recommended items (actions) are returned as feedback (the so-called observations from the environments, such as views, clicks, skips, purchases, and ratings) to the recommendation agents, which usually convert the users' feedback into a reward signal [22].…”

Section: The Gommir Modelmentioning

confidence: 99%

Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance Feedback

Wu,

Macdonald,

Ounis

2023

Proceedings of the 17th ACM Conference on Recommender Systems

View full text Add to dashboard Cite

Interactive recommendation enables users to provide verbal and non-verbal relevance feedback (such as natural-language critiques and likes/dislikes) when viewing a ranked list of recommendations (such as images of fashion products), in order to guide the recommender system towards their desired items (i.e. goals) across multiple interaction turns. Such a multi-modal interactive recommendation (MMIR) task has been successfully formulated with deep reinforcement learning (DRL) algorithms by simulating the interactions between an environment (i.e. a user) and an agent (i.e. a recommender system). However, it is typically challenging and unstable to optimise the agent to improve the recommendation quality associated with implicit learning of multi-modal representations in an end-to-end fashion in DRL. This is known as the coupling of policy optimisation and representation learning. To address this coupling issue, we propose a novel goal-oriented multi-modal interactive recommendation model (GOMMIR) that uses both verbal and non-verbal relevance feedback to effectively incorporate the users' preferences over time. Specifically, our GOMMIR model employs a multi-task learning approach to explicitly learn the multi-modal representations using a multi-modal composition network when optimising the recommendation agent. Moreover, we formulate the MMIR task using goal-oriented reinforcement learning and enhance the optimisation objective by leveraging non-verbal relevance feedback for hard negative sampling and providing extra goal-oriented rewards to effectively optimise the recommendation agent. Following previous work, we train and evaluate our GOMMIR model by using user simulators that can generate natural-language feedback about the recommendations as a surrogate for real human users. Experiments conducted on four well-known fashion datasets demonstrate that our proposed GOMMIR model yields significant improvements in comparison to the existing state-of-the-art baseline models.

show abstract

Deep reinforcement learning in recommender systems: A survey and new perspectives

Chen¹,

Yao²,

McAuley³

et al. 2023

Knowledge-Based Systems

View full text Add to dashboard Cite

State Encoders in Reinforcement Learning for Recommendation

Cited by 8 publications

References 27 publications

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Goal-Oriented Multi-Modal Interactive Recommendation with Verbal and Non-Verbal Relevance Feedback

Deep reinforcement learning in recommender systems: A survey and new perspectives

Contact Info

Product

Resources

About