Finite time analysis of the pursuit algorithm for learning automata

Rajaraman, K.C.; Sastry, P. S.

doi:10.1109/3477.517033

Cited by 40 publications

(52 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In other words, the proofs of these results are already found in the literature, namely in [26] and in [23] respectively. The proofs are thus not repeated here.…”

Section: 1mentioning

confidence: 56%

“…The proofs for the convergence of PAs have been studied and reported for decades in [6], [23], [24], [25] [26], which, unfortunately, all have a common flaw that has been recently discovered by the authors of [27]. Further, the authors of [27] submitted a new proof for the convergence of the CPA which adequately rectified the flawed proofs.…”

Section: Proofs Of Pasmentioning

confidence: 99%

See 1 more Smart Citation

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Zhang

Oommen

Granmo

2016

Pattern Anal Applic

View full text Add to dashboard Cite

The fundamental phenomenon that has been used to enhance the convergence speed of Learning Automata (LA) is that of incorporating the running Maximum Likelihood (ML) estimates of the action reward probabilities into the probability updating rules for selecting the actions. The frontiers of this field have been recently expanded by replacing the ML estimates with their corresponding Bayesian counterparts that incorporate the properties of the conjugate priors [1][2][3]. These constitute the Bayesian Pursuit Algorithm (BPA) [1], and the Discretized Bayesian Pursuit Algorithm (DBPA) [2,3]. Although these algorithms have been designed and efficiently implemented 1 , and are, arguably, the fastest and most accurate LA reported in the literature 2 , the proofs of their ε-optimal convergence has been unsolved. This is precisely the intent of this paper. In this paper, we present a single unifying analysis by which the proofs of both the continuous and discretized schemes are proven. We emphasize that unlike the ML-based pursuit schemes, the Bayesian schemes have to not only consider the estimates themselves but also the distributional forms of their conjugate posteriors and their higher order moments -all of which render the proofs to be particularly challenging. As far as we know, apart from the results themselves, the methodologies of this proof have been unreported in the literature -they are both pioneering and novel. ‡ This author can be contacted at: Department of ICT, University of Agder, Grimstad, Norway. E-mail address: ole.granmo@uia.no.1 The version of the BPA presented here, namely the Absorbing Bayesian Pursuit Algorithm (ABPA), is distinct from the version presented in [1]. The reason for proposing this newer absorbing version will be explained in the body of the paper.2 The families of BPA are faster and more accurate than their counterparts that invoke the ML estimates because unlike the former which use the information in the mean, the BPA families utilize the information at a higher-end quantile (95%th precentile) of the posterior Bayesian distribution. This is the rationale for the claim that they are probably, the fastest and most accurate reported LA.

show abstract

“…In other words, the proofs of these results are already found in the literature, namely in [26] and in [23] respectively. The proofs are thus not repeated here.…”

Section: 1mentioning

confidence: 56%

Section: Proofs Of Pasmentioning

confidence: 99%

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

Zhang

Oommen

Granmo

2016

Pattern Anal Applic

View full text Add to dashboard Cite

show abstract

“…The formal "corrected" proof for the finite time analysis of the DPA [2] remains open. It is currently being investigated.…”

Section: Discussionmentioning

confidence: 99%

“…Hence the dilemma! Prior Proofs: The ε-optimality of the families of PAs have been presented in [2], [13], [10], [11], and [12]. The basic result stated in these papers is that by utilizing a sufficiently small value for the learning parameter (or resolution), both the CPA and the DPA will converge to the optimal action with an arbitrarily large probability.…”

Section: Proof Complexity For Easmentioning

confidence: 99%

See 1 more Smart Citation

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

et al. 2015

View full text Add to dashboard Cite

Learning Automata (LA) can be reckoned to be the founding algorithms on which the field of Reinforcement Learning has been built. Among the families of LA, Estimator Algorithms (EAs) are certainly the fastest, and of these, the family of discretized algorithms are proven to converge even faster than their continuous counterparts. However, it has recently been reported that the previous proofs for ε-optimality for all the reported algorithms for the past three decades have been flawed 1 . We applaud the researchers who discovered this flaw, and who further proceeded to rectify the proof for the Continuous Pursuit Algorithm (CPA). The latter proof examines the monotonicity property of the probability of selecting the optimal action, and requires the learning parameter to be continuously changing. In this paper, we provide a new method to prove the ε-optimality of the Discretized Pursuit Algorithm (DPA) which does not require this constraint, by virtue of the fact that the DPA has, in and of itself, absorbing barriers to which the LA can jump in a discretized manner. Unlike the proof given [3] for an absorbing version of the CPA, which utilizes the single-action Hoeffding's inequality, the current proof invokes, what we shall refer to, as the "multi-action" version of the Hoeffding's inequality. We believe that our proof is both unique and pioneering. It can also form the basis for formally showing the ε-optimality of the other EAs that possess absorbing states.

show abstract

On Utilizing Stochastic Learning Weak Estimators for Training and Classification of Patterns with Non-stationary Distributions

Oommen

Rueda

2005

KI 2005: Advances in Artificial Intelligence

View full text Add to dashboard Cite

Finite time analysis of the pursuit algorithm for learning automata

Cited by 40 publications

References 11 publications

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

The design of absorbing Bayesian pursuit algorithms and the formal analyses of their ε-optimality

A formal proof of the 𝜖-optimality of discretized pursuit algorithms

On Utilizing Stochastic Learning Weak Estimators for Training and Classification of Patterns with Non-stationary Distributions

Contact Info

Product

Resources

About