2020 IEEE Conference on Games (CoG) 2020
DOI: 10.1109/cog47356.2020.9231529
|View full text |Cite
|
Sign up to set email alerts
|

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

Abstract: This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as-greedy with linear regression models. Second, we exa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…The user's steps are captured by Fitbit and synced automatically with our platform. We use the AI technique of multi-armed bandits to model individual users' social comparison preferences and adapt the comparison targets shown to them [12,13]. For example, if the user model predicts that a particular user tends to prefer upward comparisons, the system will show more profiles with a larger number of daily steps.…”
Section: The Personalization Paradox In Adaptive Exergamesmentioning
confidence: 99%
“…The user's steps are captured by Fitbit and synced automatically with our platform. We use the AI technique of multi-armed bandits to model individual users' social comparison preferences and adapt the comparison targets shown to them [12,13]. For example, if the user model predicts that a particular user tends to prefer upward comparisons, the system will show more profiles with a larger number of daily steps.…”
Section: The Personalization Paradox In Adaptive Exergamesmentioning
confidence: 99%