Proceedings of the 13th International Conference on Web Search and Data Mining 2020
DOI: 10.1145/3336191.3371801
|View full text |Cite
|
Sign up to set email alerts
|

Pseudo Dyna-Q

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
3

Relationship

1
8

Authors

Journals

citations
Cited by 94 publications
(25 citation statements)
references
References 28 publications
0
25
0
Order By: Relevance
“…Model-based models. Model-based models represent the whole environment by learning a fixed model [6,43,48,50]. However, in recommendation scenario, the model's representation of the environment will be biased due to the dynamic changes in the environment.…”
Section: Reinforcement Learning Based Recommendationmentioning
confidence: 99%
“…Model-based models. Model-based models represent the whole environment by learning a fixed model [6,43,48,50]. However, in recommendation scenario, the model's representation of the environment will be biased due to the dynamic changes in the environment.…”
Section: Reinforcement Learning Based Recommendationmentioning
confidence: 99%
“…Model-based RL. These methods directly model the environment dynamics [2,8,45], which improves sample efficiency. However, it is very difficult to estimate the state transition in many real-world recommendation tasks with large state and action spaces [1].…”
Section: Related Workmentioning
confidence: 99%
“…However, it is very challenging to apply RL to industrial-scale recommender systems serving a tremendous amount of users with diverse preferences concerning a huge item corpus [5]. As RL-based recommender systems often treat users as states and items as actions, the state and action spaces are extremely large (typically in millions or billions), making classic RL methods rather sample inefficient [2,8,45]. In classic RL scenarios, one way to improve sample efficiency is to adopt model-based RL, which directly models the environment dynamics [20,30].…”
Section: Introductionmentioning
confidence: 99%
“…Relating buyers' impression to a potential seller via buyer used search words [28]. Similar use cases of relating customer feedback to items through Q-learning and self-attention mechanism [29]. Graph scenarios have been approached using a knowledge graph to discover high-quality negative signals from implicit feedback to interpret user intent [30].…”
Section: Related Workmentioning
confidence: 99%