Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023
DOI: 10.1145/3539618.3591636
|View full text |Cite
|
Sign up to set email alerts
|

Alleviating Matthew Effect of Offline Reinforcement Learning in Interactive Recommendation

Abstract: Offline reinforcement learning (RL), a technology that offline learns a policy from logged data without the need to interact with online environments, has become a favorable choice in decision-making processes like interactive recommendation. Offline RL faces the value overestimation problem. To address it, existing methods employ conservatism, e.g., by constraining the learned policy to be close to behavior policies or punishing the rarely visited state-action pairs. However, when applying such offline RL to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(1 citation statement)
references
References 54 publications
(77 reference statements)
0
1
0
Order By: Relevance
“…Taking the popularity shift (a.k.a. popularity bias) as an example, user representations may become excessively aligned with popular items, thereby exacerbating the Matthew effect, as demonstrated in [11]. For the convenience of subsequent analysis, we denote L 𝑠𝑚𝑜𝑜𝑡ℎ (𝑢) = E 𝑣∼𝑃 𝑢 E 𝑇 𝑢 E 𝑢 + 𝑑 𝑢 • 𝑔(𝑢, 𝑣; 𝜃 ) as the smoothness regularizer on the specific node 𝑢.…”
Section: Distributionally Robust Gnnmentioning
confidence: 99%
“…Taking the popularity shift (a.k.a. popularity bias) as an example, user representations may become excessively aligned with popular items, thereby exacerbating the Matthew effect, as demonstrated in [11]. For the convenience of subsequent analysis, we denote L 𝑠𝑚𝑜𝑜𝑡ℎ (𝑢) = E 𝑣∼𝑃 𝑢 E 𝑇 𝑢 E 𝑢 + 𝑑 𝑢 • 𝑔(𝑢, 𝑣; 𝜃 ) as the smoothness regularizer on the specific node 𝑢.…”
Section: Distributionally Robust Gnnmentioning
confidence: 99%