2011
DOI: 10.1007/978-3-642-23808-6_1
|View full text |Cite
|
Sign up to set email alerts
|

Sparse Kernel-SARSA(λ) with an Eligibility Trace

Abstract: Abstract.We introduce the first online kernelized version of SARSA(λ) to permit sparsification for arbitrary λ for 0 ≤ λ ≤ 1; this is possible via a novel kernelization of the eligibility trace that is maintained separately from the kernelized value function. This separation is crucial for preserving the functional structure of the eligibility trace when using sparse kernel projection techniques that are essential for memory efficiency and capacity control. The result is a simple and practical Kernel-SARSA(λ) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 13 publications
(17 reference statements)
0
8
0
Order By: Relevance
“…Mountain Car is one of the domains from the standard benchmarks in the NIPS Workshop 2005. They are used to evaluate the performances of various reinforcement learning algorithms [1], [20]- [23], [26], [31]. Note that OSKTD is a method for prediction.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Mountain Car is one of the domains from the standard benchmarks in the NIPS Workshop 2005. They are used to evaluate the performances of various reinforcement learning algorithms [1], [20]- [23], [26], [31]. Note that OSKTD is a method for prediction.…”
Section: Methodsmentioning
confidence: 99%
“…2) Approach of Kernel Redefinition: Let us consider the elements β (s, s i ) and k(s, s i ) in the value function (31). Let…”
Section: Properties Of the Selective Kernel-based Value Functionmentioning
confidence: 99%
See 1 more Smart Citation
“…Studies of individual and organizational decision making document several types of failures to learn from real‐world feedback based on the gaps between what was expected and what actually occurred, or between what was achieved and what could have been achieved by better decisions (if this is known) . Formal models of how to adaptively modify decision processes or decision rules to reduce regret—for example, by selecting actions next time a situation is encountered in a Markov decision process, or in a game against nature (with an unpredictable, possibly adversarial, environment) using probabilities that reflect cumulative regret for not having used each action in such situations in the past—require explicitly collecting and analyzing such data . Less formally, continually assessing the outcomes of decisions and how one might have done better, as required by the regret‐minimization framework, means that opportunities to learn from experience will more often be exploited instead of missed. Increase experimentation and adaptation .…”
Section: Doing Better: Using Predictable Rational Regret To Improve Bcamentioning
confidence: 99%
“…In this case, formal models of regret reduction typically require exploring different decision rules to find out what works best. Such learning strategies (called “on‐policy” learning algorithms, since they learn only from experience with the policy actually used, rather than from information about what would have happened if something different had been tried) have been extensively developed and applied successfully to regret reduction in machine learning and game theory . They adaptively weed out the policies that are followed by the least desirable consequences, and increase the selection probabilities for policies that are followed by preferred consequences.…”
Section: Doing Better: Using Predictable Rational Regret To Improve Bcamentioning
confidence: 99%