A new approach to the design of reinforcement schemes for learning automata

Thathachar, M. A. L.; Sastry, P. S.

doi:10.1109/tsmc.1985.6313407

Cited by 168 publications

(81 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thathachar and Sastry [3,29] were the first to introduce the concept of Pursuit Algorithms (PA), initiating the re-search on estimator algorithms [30,31]. As opposed to nonestimator algorithms, where the action probabilities are directly updated based on rewards/penalties, the estimator algorithms combine running estimates of reward probabilities to make the updating more goal-directed.…”

Section: Outline Of the Classification Of Learning Automatamentioning

confidence: 99%

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

2013

View full text Add to dashboard Cite

There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (L RI )-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pur- suing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011. The key innovation of this paper is that the linear discrete updating rules mitigate the counterintuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are -optimal in stationary environments.

show abstract

Section: Outline Of the Classification Of Learning Automatamentioning

confidence: 99%

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

2013

View full text Add to dashboard Cite

show abstract

“…Most of these approaches are action-value methods (Thathachar and Sastry 1995), in which the agent maintains a running estimate of the expected reward for each arm. This estimate can be computed by simply averaging the rewards the agent has received on each previous pull of the given arm.…”

Section: K-armed Bandit Problemmentioning

confidence: 99%

“…Another approach is pursuit methods (Thathachar and Sastry 1995), which maintain both preferences and action-value estimates.…”

Section: K-armed Bandit Problemmentioning

confidence: 99%

Adaptive Representations for Reinforcement Learning

Stone

Whiteson

2010

Studies in Computational Intelligence

View full text Add to dashboard Cite

AcknowledgmentsFirst and foremost, I would like to thank my parents, without whose tireless support and encouragement none of my achievements, academic or otherwise, would have been possible.In particular, I am grateful to my father, Adam, for instilling in me a deep enmity for ignorance, a lasting appreciation for the beauty of mathematics, and an uncompromising respect for the rigor of science. I am grateful to my mother, Rena, for convincing me that I was capable of achieving any goal I set and for instilling in me a wellspring of self-confidence which sustains me to this day.I am also thankful to my brothers for their guidance, advice, and mentorship as well as for the often contrasting examples they set for me. I am grateful to Yoseif for convincing me that computers are fun, for showing me how to use them, and for introducing me to the art of computer programming by teaching me BASIC when I was seven years old. I am grateful to Daniel for setting an example of stratospheric success, for treating me like a friend rather than a little brother, and for offering a bottomless well of empathy for all the trials of graduate school and life in general. In reinforcement learning, an autonomous agent seeks an effective control policy for tackling a sequential decision task. Unlike in supervised learning, the agent never sees examples of correct or incorrect behavior but receives only a reward signal as feedback. One limitation of current methods is that they typically require a human to manually design a representation for the solution (e.g. the internal structure of a neural network). Since poor design choices can lead to grossly suboptimal policies, agents that automatically adapt their own representations have the potential to dramatically improve performance. This thesis introduces two novel approaches for automatically discovering high-performing representations.The first approach synthesizes temporal difference methods, the traditional approach to reinforcement learning, with evolutionary methods, which can learn representations for vi a broad class of optimization problems. This synthesis is accomplished via 1) on-line evolutionary computation, which customizes evolutionary methods to the on-line nature of most reinforcement learning problems, and 2) evolutionary function approximation, which evolves representations for the value function approximators that are critical to the temporal difference approach.The second approach, called adaptive tile coding, automatically learns representations based on tile codings, which form piecewise-constant approximations of value functions. It begins with coarse representations and gradually refines them during learning, analyzing the current policy and value function to deduce the best refinements.This thesis also introduces a novel method for devising input representations. In particular, it presents a way to find a minimal set of features sufficient to describe the agent's current state, a challenge known as the feature selection problem. The technique, called Feature ...

show abstract

“…The accuracy of the solution can be increased by choosing a finer discretization and hence increasing the number of actions of automaton, which leads to slow convergence of the learning algorithm. In order to provide a higher rate of convergence for FALA, hierarchical structure LA 24 , discretized LA 25 , estimator algorithms 26,27,28 , and pursuit algorithms 29,30,31,32 have been introduced. A more satisfying solution is to use CALA in which the action-set of the automaton is a continuous variable.…”

Section: Stochastic Learning Automatamentioning

confidence: 99%

Adaptive Limited Fractional Guard Channel Algorithms: A Learning Automata Approach

Beigy

Meybodi

2009

Int. J. Unc. Fuzz. Knowl. Based Syst.

View full text Add to dashboard Cite

In this paper, two learning automata based adaptive limited fractional guard channel algorithms for cellular mobile networks are proposed. These algorithms try to minimize the blocking probability of new calls subject to the constraint on the dropping probability of the handoff calls. To evaluate the proposed algorithms, computer simulation are conducted. The simulation results show that the performance of the proposed algorithms are close to the performance of the limited fractional guard channel algorithm for which prior knowledge about traffic parameters are needed. The simulation results also show that the proposed algorithms outperforms the recently introduced dynamic guard channel algorithms.

show abstract

A new approach to the design of reinforcement schemes for learning automata

Cited by 168 publications

References 0 publications

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

Adaptive Representations for Reinforcement Learning

Adaptive Limited Fractional Guard Channel Algorithms: A Learning Automata Approach

Contact Info

Product

Resources

About