1995
DOI: 10.1162/neco.1995.7.5.950
|View full text |Cite
|
Sign up to set email alerts
|

Local and Global Optimization Algorithms for Generalized Learning Automata

Abstract: This paper analyzes the long-term behavior of the REINFORCE and related algorithms (Williams 1986(Williams , 1988(Williams , 1992 for generalized learning automata (Narendra and Thathachar 1989) for the associative reinforcement learning problem (Barto and Anandan 1985). The learning system considered here is a feedforward connectionist network of generalized learning automata units. We show that REINFORCE is a gradient ascent algorithm but can exhibit unbounded behavior. A modified version of this algorithm, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0
1

Year Published

2003
2003
2016
2016

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(20 citation statements)
references
References 11 publications
0
19
0
1
Order By: Relevance
“…Some examples of these varieties are finite action set learning automata (FALA) [29], parameterized learning automata (PLA) [29], generalized learning automata (GLA) [30], continuous action set learning automata (CALA) [29], game of LA [29], and network of LA [31]. Here, we explain the learning automata that we used in details.…”
Section: Learning Automatamentioning
confidence: 99%
“…Some examples of these varieties are finite action set learning automata (FALA) [29], parameterized learning automata (PLA) [29], generalized learning automata (GLA) [30], continuous action set learning automata (CALA) [29], game of LA [29], and network of LA [31]. Here, we explain the learning automata that we used in details.…”
Section: Learning Automatamentioning
confidence: 99%
“…The proposed RL-DSA algorithm is based on the modified REINFORCE methods presented in [20], which have been proven to converge to global maximum of reward signal in the long term because of the climbing of an appropriate gradient of the average reward, and to the inclusion of a perturbation term to allow RL to get out of local maxima. In this paper, we focus on the simplest REINFORCE agent, which is based on Bernoulli distributions and logistic functions.…”
Section: A Single Rl Agentmentioning
confidence: 99%
“…Finally, the third term introduces a perturbation parameter ζ i (t), which is a random variable of zero mean and variance σ 2 (e.g., in this paper, ζ i (t) takes the value either +σ or −σ with equal probability, being σ a positive constant). This term was proposed to give the algorithm the capability of escaping from local maxima and reaching global maximum of the average reward with a sufficient small value of σ and a sufficient number of iterations of the learning loop [20]. Fig.…”
Section: A Single Rl Agentmentioning
confidence: 99%
“…It can perform both probability updating and action selection with a low computational complexity which is independent of the number of actions. Also, Phansalker and Thathachar [10] proposed a learning automata model using constrained optimization techniques. This model can overcome the disadvantage of unbounded behavior in classical gradient ascent learning algorithms and has been successfully applied to local and global optimization problems.…”
Section: Introductionmentioning
confidence: 99%