Discretized Bayesian Pursuit – A New Scheme for Reinforcement Learning

Zhang, Xuan; Granmo, Ole‐Christoffer; Oommen, B. John

doi:10.1007/978-3-642-31087-4_79

Cited by 20 publications

(16 citation statements)

References 17 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The most difficult part in the design and analysis of LA consists of the formal proofs of their convergence accuracies. 1 The mathematical techniques used for the various families (FSSA, VSSA, Discretized, etc.) are quite distinct.…”

Section: Outline Of the Classification Of Learning Automatamentioning

confidence: 99%

See 1 more Smart Citation

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

2013

Self Cite

View full text Add to dashboard Cite

There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (L RI )-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pur- suing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608-620, 2011. The key innovation of this paper is that the linear discrete updating rules mitigate the counterintuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are -optimal in stationary environments.

show abstract

Section: Outline Of the Classification Of Learning Automatamentioning

confidence: 99%

“…In this paper, we extend the BPA into the domain of discretization, and propose a new Bayesian estimator algorithm, namely, the Discretized Bayesian Pursuit Algorithm (DBPA) [1]. Firstly, the DBPA maintains an action probability vector for selecting actions.…”

Section: Contributions and Paper Organizationmentioning

confidence: 99%

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although LA have been studied extensively [1,[13][14][15]17] and been applied in many fields [4,9], designing LA when the number of actions involved, R, is large is extremely complex. The solution that we propose in this paper attempts to resolve this problem.…”

Section: Introductionmentioning

confidence: 99%

The Hierarchical Continuous Pursuit Learning Automation for Large Numbers of Actions

Yazidi

Zhang

Jiao

et al. 2018

IFIP Advances in Information and Communication Technology

Self Cite

View full text Add to dashboard Cite

Although the field of Learning Automata (LA) has made significant progress in the last four decades, the LA-based methods to tackle problems involving environments with a large number of actions are, in reality, relatively unresolved. The extension of the traditional LA (fixed structure, variable structure, discretized, and pursuit) to problems within this domain cannot be easily established when the number of actions is very large. This is because the dimensionality of the action probability vector is correspondingly large, and consequently, most components of the vector will, after a relatively short time, have values that are smaller than the machine accuracy permits, implying that they will never be chosen. This paper pioneers a solution that extends the continuous pursuit paradigm to such large-actioned problem domains. The beauty of the solution is that it is hierarchical, where all the actions offered by the environment reside as leaves of the hierarchy. Further, at every level, we merely require a two-action LA which automatically resolves the problem of dealing with arbitrarily small action probabilities. Additionally, since all the LA invoke the pursuit paradigm, the best action at every level trickles up towards the root. Thus, by invoking the property of the "max" operator, in which, the maximum of numerous maxima is the overall maximum, the hierarchy of LA converges to the optimal action. Apart from reporting the theoretical properties of the scheme, the paper contains extensive experimental results which demonstrate the power of the scheme and its computational advantages. As far as we know, there are no comparable results in the field of LA.

show abstract

“…If the values allowed are equally spaced in this interval, the discretization is said to be linear, otherwise, the discretization is called non-linear. Following the discretization concept, many of the continuous VSSA have been discretized; indeed, discretized versions of almost all continuous automata have been reported [10,13,14].…”

Section: Introductionmentioning

confidence: 99%

“…In order to highlight the distinct characteristics of the DPA within the family of PAs, the continuous version is referred to as the CPA 2 . We briefly mention that discretized versions of all the reported EA schemes have been devised [9,13,14].…”

Section: Introductionmentioning

confidence: 99%

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

et al. 2015

Self Cite

View full text Add to dashboard Cite

Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for ε-optimality for all the reported algorithms for the past three decades have been flawed 1 . We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the ε-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given [3] for an absorbing version of the CPA, which utilizes the single-action Hoeffding's inequality, the current proof invokes, what we shall refer to, as the "multi-action" version of the Hoeffding's inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the ε-optimality of the other EAs that possess absorbing states.

show abstract

Discretized Bayesian Pursuit – A New Scheme for Reinforcement Learning

Cited by 20 publications

References 17 publications

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

The Hierarchical Continuous Pursuit Learning Automation for Large Numbers of Actions

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

Contact Info

Product

Resources

About