User Response Models to Improve a REINFORCE Recommender System

Chen, Minmin; Chang, Bo; Xu, Can; H., Ed

doi:10.1145/3437963.3441764

Cited by 32 publications

(17 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, weight capping and self-normalized importance sampling are used to further reduce the variance. Moreover, a large state space and action space will cause sample inefficiency problems as REINFORCE relies on the current sampled trajectories 𝜏. Chen et al [14] finds that the auxiliary loss can help improve the sample efficiency [44,81]. Specifically, a linear projection is applied to the state 𝑠 𝑡 , the output is combined with action 𝑎 𝑡 to calculate the auxiliary loss and appended into the final overall objective function for optimization.…”

Section: Model-free Deep Reinforcement Learning Based Methodsmentioning

confidence: 99%

“…Existing DRL-based RS studies on traditional experience replay methods often demonstrate slow converge speed. Chen et al [14] design a user model to improve the sample efficiency through auxiliary learning. Specifically, they apply the auxiliary loss with the state representation, and the model distinguishes low-activity users and asks the agent to update the recommendation policy based on high-activity users more frequently.…”

Section: Sample Efficiencymentioning

confidence: 99%

See 1 more Smart Citation

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

Chen¹,

Yao²,

McAuley³

et al. 2021

Preprint

View full text Add to dashboard Cite

In light of the emergence of deep reinforcement learning (DRL) in recommender systems research and several fruitful results in recent years, this survey aims to provide a timely and comprehensive overview of the recent trends of deep reinforcement learning in recommender systems. We start with the motivation of applying DRL in recommender systems. Then, we provide a taxonomy of current DRL-based recommender systems and a summary of existing methods. We discuss emerging topics and open issues, and provide our perspective on advancing the domain. This survey serves as introductory material for readers from academia and industry into the topic and identifies notable opportunities for further research.

show abstract

Section: Model-free Deep Reinforcement Learning Based Methodsmentioning

confidence: 99%

Section: Sample Efficiencymentioning

confidence: 99%

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

Chen¹,

Yao²,

McAuley³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this work, we adopt a multi-task learning [8] approach for POMDP (inspired by [32] and [4]) to optimise the networks with a combination of a supervised learning classification loss and a Q-learning prediction loss.…”

Section: The Learning Algorithmmentioning

confidence: 99%

“…This trend shows that the visual reward 𝑟 𝑣𝑖𝑠 𝑡 is more informative than the ranking percentile reward 𝑟 𝑝𝑒𝑟 𝑡 in the EGE (Filter) model on the Shoes dataset, while the ranking percentile reward 𝑟 𝑝𝑒𝑟 𝑡 is more important than the visual reward 𝑟 𝑣𝑖𝑠 𝑡 on the Fashion IQ Dress dataset. Such a difference can be attributed to a domain factor from the datasets in that the images from the Fashion IQ Dress dataset usually include a human model to display the clothing while the images from the Shoes dataset only contain shoes without a model (as can be observed in the image databases for shoes 4 and dresses 5 ). The visual features of the human models can confuse the ResNet component when mapping the dress images to the image feature (ResNet) space.…”

Section: Impact Of Hyper-parameters (Rq3)mentioning

confidence: 99%

Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

Macdonald

Ounis

2021

Fifteenth ACM Conference on Recommender Systems

View full text Add to dashboard Cite

A dialog-based interactive recommendation task is where users can express natural-language feedback when interacting with the recommender system. However, the users' feedback, which takes the form of natural-language critiques about the recommendation at each iteration, can only allow the recommender system to obtain a partial portrayal of the users' preferences. Indeed, such partial observations of the users' preferences from their natural-language feedback make it challenging to correctly track the users' preferences over time, which can result in poor recommendation performances and a less effective satisfaction of the users' information needs when in presence of limited iterations. Reinforcement learning, in the form of a partially observable Markov decision process (POMDP), can simulate the interactions between a partially observable environment (i.e. a user) and an agent (i.e. a recommender system). To alleviate such a partial observation issue, we propose a novel dialogbased recommendation model, the Estimator-Generator-Evaluator (EGE) model, with Q-learning for POMDP, to effectively incorporate the users' preferences over time. Specifically, we leverage an Estimator to track and estimate users' preferences, a Generator to match the estimated preferences with the candidate items to rank the next recommendations, and an Evaluator to judge the quality of the estimated preferences considering the users' historical feedback. Following previous work, we train our EGE model by using a user simulator which itself is trained to describe the differences between the target users' preferences and the recommended items in natural language. Thorough and extensive experiments conducted on two recommendation datasets -addressing images of fashion products (namely dresses and shoes) -demonstrate that our proposed EGE model yields significant improvements in comparison to the existing state-of-the-art baseline models.

show abstract

“…We can leverage auxiliary tasks to improve sampling efficiency. For example, Chen et al [136] develop a user response model to predict user positive or negative responses toward recommendations. Thus the state and action representations can be enhanced via these responses.…”

Section: Sampling Efficiencymentioning

confidence: 99%

A Survey on Reinforcement Learning for Recommender Systems

Lin

Liu

Lin

et al. 2021

Preprint

View full text Add to dashboard Cite

Recommender systems have been widely applied in different real-life scenarios to help us find useful information. Recently, Reinforcement Learning (RL) based recommender systems have become an emerging research topic. It often surpasses traditional recommendation models even most deep learning-based methods, owing to its interactive nature and autonomous learning ability. Nevertheless, there are various challenges of RL when applying in recommender systems. Toward this end, we firstly provide a thorough overview, comparisons, and summarization of RL approaches for five typical recommendation scenarios, following three main categories of RL: value-function, policy search, and Actor-Critic. Then, we systematically analyze the challenges and relevant solutions on the basis of existing literature. Finally, under discussion for open issues of RL and its limitations of recommendation, we highlight some potential research directions in this field.

show abstract

User Response Models to Improve a REINFORCE Recommender System

Cited by 32 publications

References 15 publications

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

A Survey of Deep Reinforcement Learning in Recommender Systems: A Systematic Review and Future Directions

Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation

A Survey on Reinforcement Learning for Recommender Systems

Contact Info

Product

Resources

About