2007
DOI: 10.1016/j.geb.2007.01.005
|View full text |Cite
|
Sign up to set email alerts
|

Transient and asymptotic dynamics of reinforcement learning in games

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
35
0
1

Year Published

2008
2008
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 57 publications
(36 citation statements)
references
References 50 publications
0
35
0
1
Order By: Relevance
“…Such rules satisfy the above condition in a trivial manner because the expected distribution tomorrow is the same as today. 19 We conclude, therefore, that such a property is too restrictive. Restricting the set of environments on which the improvement is required would lead us to identify a larger class of learning rules.…”
Section: Discussionmentioning
confidence: 78%
See 1 more Smart Citation
“…Such rules satisfy the above condition in a trivial manner because the expected distribution tomorrow is the same as today. 19 We conclude, therefore, that such a property is too restrictive. Restricting the set of environments on which the improvement is required would lead us to identify a larger class of learning rules.…”
Section: Discussionmentioning
confidence: 78%
“…Hence, our analysis provides a similar level of generality 18 See Claim 1 in the Appendix for the proof. 19 It can also be shown that unbiased learning rules are the only learning rules which are continuous in x for all a, a ∈ A and satisfy a (α a (s) + f a (s)) F a sosd a α a (s)F a in every environment. For details, see Claim 2 in the Appendix.…”
Section: Discussionmentioning
confidence: 99%
“…However, very different update rules, with a clear learning interpretation, lead to other outcomes. It is worth quoting work in progress by Galán, Izquierdo, Santos, and Sánchez (2010), who elaborate on previous research on two-player games (Izquierdo, Izquierdo, Gotts, & Polhill, 2007;Izquierdo, Izquierdo, & Gotts, 2008) with reinforcement learning dynamics as used by Macy and Flache (2002). Using the Prisoner's Dilemma as an example, they observe that the asymptotic state of a two-player game with this dynamics is full cooperation for both players with transients around a mixed strategy equilibrium.…”
Section: Strategy Update and Cooperation In Social Network 15mentioning
confidence: 98%
“…Polhill et al, 2001;Gotts et al, 2003), teoría de procesos estocásticos (Izquierdo et al, 2008a), aproximaciones de campo medio (ver p. ej. Galán & Izquierdo, 2005;Izquierdo et al, 2007), visualizaciones de datos interactivas (ver p. ej. Izquierdo et al, 2008a;Izquierdo et al, 2008b), o simple inspección.…”
Section: Esquema De Modelado Computacional De Sistemas Complejosunclassified