2016
DOI: 10.1109/access.2016.2556579
|View full text |Cite
|
Sign up to set email alerts
|

Meta-learning within Projective Simulation

Abstract: Learning models of artificial intelligence can nowadays perform very well on a large variety of tasks. However, in practice different task environments are best handled by different learning models, rather than a single, universal, approach. Most non-trivial models thus require the adjustment of several to many learning parameters, which is often done on a case-by-case basis by an external party. Meta-learning refers to the ability of an agent to autonomously and dynamically adjust its own learning parameters,… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
37
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
8
1

Relationship

4
5

Authors

Journals

citations
Cited by 32 publications
(38 citation statements)
references
References 42 publications
1
37
0
Order By: Relevance
“…In general this should, and can, be automatized. In recent work, it was shown how these parameters can be learned in the context of the PS, but similar results have been obtained for other models as well (see [17], and references therein). One way to think about this setting is to think of each configuration of parameter settings as fixing some particular learning agent/model A k .…”
Section: Explorationsupporting
confidence: 64%
See 1 more Smart Citation
“…In general this should, and can, be automatized. In recent work, it was shown how these parameters can be learned in the context of the PS, but similar results have been obtained for other models as well (see [17], and references therein). One way to think about this setting is to think of each configuration of parameter settings as fixing some particular learning agent/model A k .…”
Section: Explorationsupporting
confidence: 64%
“…Often, we are facing models at the opposite extreme where the parameter space has plenty of structure. For example, in [17], the meta-parameter γ -the "forgetting" parameter, but the details are not relevant here -of the PS model is, is optimized. The analysis of this work also suggests that, in many environments, the abstract mapping, which maps the parameter value to and averaged performance [0, 1] eval → R + , is an unimodal function -it is increasing to some value γ opt , and decreasing afterwards.…”
Section: Explorationmentioning
confidence: 99%
“…If we additionally label each percept (think positions, or directions of optimal moves in maze problems), then many well-studied reinforcement learning models (e.g. Q-Learning [14,26], Policy iteration [34] or the more recent Projective Simulation [35,36] model), together with the maze environment (with a unique winning path) do form luck-favoring pairs for all histories, so Theorem 1 applies. To further explain why this is the case (but without going into the details of these learning models), recall that in this single-win, bounded maximal time (M ) case, there is only one M -length history which has a reward.…”
Section: A Q a Q Ementioning
confidence: 99%
“…The glow parameter is relevant in environments with delayed rewards such as the GridWorld [23] discussed in section 4. For a more detailed description of PS we refer the reader to [38,44,45].…”
Section: Projective Simulationmentioning
confidence: 99%