Transient and asymptotic dynamics of reinforcement learning in games

Izquierdo, Luis R.; Izquierdo, Segismundo S.; Gotts, Nick; Polhill, Gary

doi:10.1016/j.geb.2007.01.005

Cited by 57 publications

(36 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Such rules satisfy the above condition in a trivial manner because the expected distribution tomorrow is the same as today. 19 We conclude, therefore, that such a property is too restrictive. Restricting the set of environments on which the improvement is required would lead us to identify a larger class of learning rules.…”

Section: Discussionmentioning

confidence: 78%

“…Hence, our analysis provides a similar level of generality 18 See Claim 1 in the Appendix for the proof. 19 It can also be shown that unbiased learning rules are the only learning rules which are continuous in x for all a, a ∈ A and satisfy a (α a (s) + f a (s)) F a sosd a α a (s)F a in every environment. For details, see Claim 2 in the Appendix.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Learning and risk aversion

Oyarzun

Sarin

2013

Journal of Economic Theory

View full text Add to dashboard Cite

We study the manner in which learning shapes behavior towards risk when individuals are not assumed to know, or to have beliefs about, probability distributions. In any period, the behavior change induced by learning is assumed to depend on the action chosen and the payoff obtained. We characterize learning processes that, in expected value, increase the probability of choosing the safest (or riskiest) actions and provide sufficient conditions for them to converge, in the long run, to the choices of risk averse (or risk seeking) expected utility maximizers. We provide a learning theoretic motivation for long run risk choices, such as those in expected utility theory with known payoff distributions. * We thank two anonymous referees and an associate editor for instructive comments. We also thank

show abstract

Section: Discussionmentioning

confidence: 78%

Section: Discussionmentioning

confidence: 99%

Learning and risk aversion

Oyarzun

Sarin

2013

Journal of Economic Theory

View full text Add to dashboard Cite

show abstract

“…However, very different update rules, with a clear learning interpretation, lead to other outcomes. It is worth quoting work in progress by Galán, Izquierdo, Santos, and Sánchez (2010), who elaborate on previous research on two-player games (Izquierdo, Izquierdo, Gotts, & Polhill, 2007;Izquierdo, Izquierdo, & Gotts, 2008) with reinforcement learning dynamics as used by Macy and Flache (2002). Using the Prisoner's Dilemma as an example, they observe that the asymptotic state of a two-player game with this dynamics is full cooperation for both players with transients around a mixed strategy equilibrium.…”

Section: Strategy Update and Cooperation In Social Network 15mentioning

confidence: 98%

Individual Strategy Update and Emergence of Cooperation in Social Networks

Roca

Sánchez

Cuesta

2012

The Journal of Mathematical Sociology

View full text Add to dashboard Cite

“…Polhill et al, 2001;Gotts et al, 2003), teoría de procesos estocásticos (Izquierdo et al, 2008a), aproximaciones de campo medio (ver p. ej. Galán & Izquierdo, 2005;Izquierdo et al, 2007), visualizaciones de datos interactivas (ver p. ej. Izquierdo et al, 2008a;Izquierdo et al, 2008b), o simple inspección.…”

Section: Esquema De Modelado Computacional De Sistemas Complejosunclassified