1983
DOI: 10.1109/tsmc.1983.6313077
|View full text |Cite
|
Sign up to set email alerts
|

Neuronlike adaptive elements that can solve difficult learning control problems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
1,336
0
14

Year Published

1997
1997
2015
2015

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 2,570 publications
(1,355 citation statements)
references
References 0 publications
5
1,336
0
14
Order By: Relevance
“…This reduces the number of paths needed to form a navigational map to around 10 to 30 in rough agreement with experimental results (Morris et al, 1982;. Such an approach of strategy improvement during the exploration phase is similar to the well studied models of reinforcement learning (Barto et al, 1983;Dayan, 1992) and policy iteration in dynamic programming (Dayan, 1996). These approaches develop paths of minimal length by optimization whereas the case studied here involves only random paths and does not necessarily lead to maps that provide the most efficient path to the target.…”
Section: Discussionsupporting
confidence: 80%
“…This reduces the number of paths needed to form a navigational map to around 10 to 30 in rough agreement with experimental results (Morris et al, 1982;. Such an approach of strategy improvement during the exploration phase is similar to the well studied models of reinforcement learning (Barto et al, 1983;Dayan, 1992) and policy iteration in dynamic programming (Dayan, 1996). These approaches develop paths of minimal length by optimization whereas the case studied here involves only random paths and does not necessarily lead to maps that provide the most efficient path to the target.…”
Section: Discussionsupporting
confidence: 80%
“…Many variants of traditional RL exist (e.g., Barto et al, 1983;Watkins, 1989;Watkins and Dayan, 1992;Moore and Atkeson, 1993;Schwartz, 1993;Rummery and Niranjan, 1994;Singh, 1994;Baird, 1995;Kaelbling et al, 1995;Peng and Williams, 1996;Mahadevan, 1996;Tsitsiklis and van Roy, 1996;Bradtke et al, 1996;Santamaría et al, 1997;Prokhorov and Wunsch, 1997;Sutton and Barto, 1998;Wiering and Schmidhuber, 1998b;Baird and Moore, 1999;Meuleau et al, 1999;Morimoto and Doya, 2000;Bertsekas, 2001;Brafman and Tennenholtz, 2002;Abounadi et al, 2002;Lagoudakis and Parr, 2003;Sutton et al, 2008;Maei and Sutton, 2010;van Hasselt, 2012). Most are formulated in a probabilistic framework, and evaluate pairs of input and output (action) events (instead of input events only).…”
Section: Deep Fnns For Traditional Rl and Markov Decision Processes (mentioning
confidence: 99%
“…More specifically, changes in state values V MF (s) imply changes in future reward, and so a change in value induced by an action is a metric that can be used to reinforce behaviours. This forms the core of the actor-critic model (Barto et al, 1983;O'Doherty et al, 2004). Experimentally, it is perhaps most directly demonstrated by conditioned reinforcement experiments (Everitt and Robbins, 2005;Meyer et al, 2012), where instrumental behaviours can be reinforced by Pavlovian CSs.…”
Section: Instrumental Behaviourmentioning
confidence: 98%