Proceedings of the 14th ACM International Conference on Web Search and Data Mining 2021
DOI: 10.1145/3437963.3441764
|View full text |Cite
|
Sign up to set email alerts
|

User Response Models to Improve a REINFORCE Recommender System

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(17 citation statements)
references
References 15 publications
0
17
0
Order By: Relevance
“…In addition, weight capping and self-normalized importance sampling are used to further reduce the variance. Moreover, a large state space and action space will cause sample inefficiency problems as REINFORCE relies on the current sampled trajectories 𝜏. Chen et al [14] finds that the auxiliary loss can help improve the sample efficiency [44,81]. Specifically, a linear projection is applied to the state 𝑠 𝑑 , the output is combined with action π‘Ž 𝑑 to calculate the auxiliary loss and appended into the final overall objective function for optimization.…”
Section: Model-free Deep Reinforcement Learning Based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, weight capping and self-normalized importance sampling are used to further reduce the variance. Moreover, a large state space and action space will cause sample inefficiency problems as REINFORCE relies on the current sampled trajectories 𝜏. Chen et al [14] finds that the auxiliary loss can help improve the sample efficiency [44,81]. Specifically, a linear projection is applied to the state 𝑠 𝑑 , the output is combined with action π‘Ž 𝑑 to calculate the auxiliary loss and appended into the final overall objective function for optimization.…”
Section: Model-free Deep Reinforcement Learning Based Methodsmentioning
confidence: 99%
“…Existing DRL-based RS studies on traditional experience replay methods often demonstrate slow converge speed. Chen et al [14] design a user model to improve the sample efficiency through auxiliary learning. Specifically, they apply the auxiliary loss with the state representation, and the model distinguishes low-activity users and asks the agent to update the recommendation policy based on high-activity users more frequently.…”
Section: Sample Efficiencymentioning
confidence: 99%
“…In this work, we adopt a multi-task learning [8] approach for POMDP (inspired by [32] and [4]) to optimise the networks with a combination of a supervised learning classification loss and a Q-learning prediction loss.…”
Section: The Learning Algorithmmentioning
confidence: 99%
“…This trend shows that the visual reward π‘Ÿ 𝑣𝑖𝑠 𝑑 is more informative than the ranking percentile reward π‘Ÿ π‘π‘’π‘Ÿ 𝑑 in the EGE (Filter) model on the Shoes dataset, while the ranking percentile reward π‘Ÿ π‘π‘’π‘Ÿ 𝑑 is more important than the visual reward π‘Ÿ 𝑣𝑖𝑠 𝑑 on the Fashion IQ Dress dataset. Such a difference can be attributed to a domain factor from the datasets in that the images from the Fashion IQ Dress dataset usually include a human model to display the clothing while the images from the Shoes dataset only contain shoes without a model (as can be observed in the image databases for shoes 4 and dresses 5 ). The visual features of the human models can confuse the ResNet component when mapping the dress images to the image feature (ResNet) space.…”
Section: Impact Of Hyper-parameters (Rq3)mentioning
confidence: 99%
“…We can leverage auxiliary tasks to improve sampling efficiency. For example, Chen et al [136] develop a user response model to predict user positive or negative responses toward recommendations. Thus the state and action representations can be enhanced via these responses.…”
Section: Sampling Efficiencymentioning
confidence: 99%