In this letter, the back-propagation algorithm with the momentum term is analyzed. It is shown that all local minima of the sum of least squares error are stable. Other equilibrium points are unstable.
A feedforward network composed of units of teams of parameterized learning automata is considered as a model of a reinforcement learning system. The internal state vector of each learning automaton is updated using an algorithm consisting of a gradient-following term and a random perturbation term. It is shown that the algorithm weakly converges to a solution of the Langevin equation, implying that the algorithm globally maximizes an appropriate function. The algorithm is decentralized, and the units do not have any information exchange during updating. Simulation results on common payoff games and pattern recognition problems show that reasonable rates of convergence can be obtained.
This paper analyzes the long-term behavior of the REINFORCE and related algorithms (Williams 1986(Williams , 1988(Williams , 1992 for generalized learning automata (Narendra and Thathachar 1989) for the associative reinforcement learning problem (Barto and Anandan 1985). The learning system considered here is a feedforward connectionist network of generalized learning automata units. We show that REINFORCE is a gradient ascent algorithm but can exhibit unbounded behavior. A modified version of this algorithm, based on constrained optimization techniques, is suggested to overcome this disadvantage. The modified algorithm is shown to exhibit local optimization properties. A global version of the algorithm, based on constant temperature heat bath techniques, is also described and shown to converge to the global maximum. All algorithms are analyzed using weak convergence techniques.
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment are developed and analyzed in this paper. The connectionist system is made of units of groups of learning automata. The learning algorithm used is the LR-I and the asymptotic behavior of this algorithm is approximated by an Ordinary Differential Equation (ODE) for low values of the learning parameter. This is done using weak convergence techniques. The reinforcement learning model is used to pose the goal of the system as a constrained optimization problem. It is shown that the ODE, and hence the algorithm exhibits local convergence properties, converging to local solutions of the related optimization problem. The three layer pattern recognition network is used as an example to show that the system does behave as predicted and reasonable rates of convergence are obtained. Simulations also show that the algorithm is robust to noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.