Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022
DOI: 10.1145/3477495.3531716
|View full text |Cite
|
Sign up to set email alerts
|

State Encoders in Reinforcement Learning for Recommendation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…It can be implemented as any sequential model such as recurrent neural network (RNN)-based models [44], Convolutional models [40,56], Transformer-based methods [12,25,51]. Huang et al [22] investigated the performances of different state encoders in RL-based recommenders. We use a naive average layer as the state tracker since it requires the least training time but nonetheless outperforms many complex encoders [22].…”
Section: The Dorl Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…It can be implemented as any sequential model such as recurrent neural network (RNN)-based models [44], Convolutional models [40,56], Transformer-based methods [12,25,51]. Huang et al [22] investigated the performances of different state encoders in RL-based recommenders. We use a naive average layer as the state tracker since it requires the least training time but nonetheless outperforms many complex encoders [22].…”
Section: The Dorl Methodsmentioning
confidence: 99%
“…Huang et al [22] investigated the performances of different state encoders in RL-based recommenders. We use a naive average layer as the state tracker since it requires the least training time but nonetheless outperforms many complex encoders [22]. It can be written as:…”
Section: The Dorl Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, we define the negative sampling and rewards that are suitable for this MMIR scenario (Section 3.3). [1,9,22,32]. In this scenario, the users' interactions with the recommended items (actions) are returned as feedback (the so-called observations from the environments, such as views, clicks, skips, purchases, and ratings) to the recommendation agents, which usually convert the users' feedback into a reward signal [22].…”
Section: The Gommir Modelmentioning
confidence: 99%