1985
DOI: 10.1109/tsmc.1985.6313407
|View full text |Cite
|
Sign up to set email alerts
|

A new approach to the design of reinforcement schemes for learning automata

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
80
0

Year Published

1992
1992
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 168 publications
(81 citation statements)
references
References 0 publications
1
80
0
Order By: Relevance
“…Thathachar and Sastry [3,29] were the first to introduce the concept of Pursuit Algorithms (PA), initiating the re-search on estimator algorithms [30,31]. As opposed to nonestimator algorithms, where the action probabilities are directly updated based on rewards/penalties, the estimator algorithms combine running estimates of reward probabilities to make the updating more goal-directed.…”
Section: Outline Of the Classification Of Learning Automatamentioning
confidence: 99%
“…Thathachar and Sastry [3,29] were the first to introduce the concept of Pursuit Algorithms (PA), initiating the re-search on estimator algorithms [30,31]. As opposed to nonestimator algorithms, where the action probabilities are directly updated based on rewards/penalties, the estimator algorithms combine running estimates of reward probabilities to make the updating more goal-directed.…”
Section: Outline Of the Classification Of Learning Automatamentioning
confidence: 99%
“…Most of these approaches are action-value methods (Thathachar and Sastry 1995), in which the agent maintains a running estimate of the expected reward for each arm. This estimate can be computed by simply averaging the rewards the agent has received on each previous pull of the given arm.…”
Section: K-armed Bandit Problemmentioning
confidence: 99%
“…Another approach is pursuit methods (Thathachar and Sastry 1995), which maintain both preferences and action-value estimates.…”
Section: K-armed Bandit Problemmentioning
confidence: 99%
“…The accuracy of the solution can be increased by choosing a finer discretization and hence increasing the number of actions of automaton, which leads to slow convergence of the learning algorithm. In order to provide a higher rate of convergence for FALA, hierarchical structure LA 24 , discretized LA 25 , estimator algorithms 26,27,28 , and pursuit algorithms 29,30,31,32 have been introduced. A more satisfying solution is to use CALA in which the action-set of the automaton is a continuous variable.…”
Section: Stochastic Learning Automatamentioning
confidence: 99%