2019
DOI: 10.48550/arxiv.1910.03094
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Combining No-regret and Q-learning

Abstract: Counterfactual Regret Minimization (CFR) has found success in settings like poker which have both terminal states and perfect recall. We seek to understand how to relax these requirements. As a first step, we introduce a simple algorithm, local no-regret learning (LONR), which uses a Q-learning-like update rule to allow learning without terminal states or perfect recall. We prove its convergence for the basic case of MDPs (and limited extensions of them) and present empirical results showing that it achieves l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…In the agent's environment, the agent transitions to another state through the taken action. If the next state could be predicted without knowing/dependent on the preceded events, then the mathematical equation of the property is given in (1).…”
Section: Research Methods 21 the Markov Propertymentioning
confidence: 99%
See 1 more Smart Citation
“…In the agent's environment, the agent transitions to another state through the taken action. If the next state could be predicted without knowing/dependent on the preceded events, then the mathematical equation of the property is given in (1).…”
Section: Research Methods 21 the Markov Propertymentioning
confidence: 99%
“…Reinforcement learning (RL), one of the machine learning (ML) methods, is utilized by these companies to train an expert agent who outperforms humans in the game. The utilization of the game in training the RL agent is to describe the complex and high-dimensional real-world data [1]- [9]. By utilizing games, RL researchers will be able to evade high experimental costs in training an agent to do intelligence tasks [10].…”
Section: Introductionmentioning
confidence: 99%