2011
DOI: 10.1007/978-3-642-22887-2_4
|View full text |Cite
|
Sign up to set email alerts
|

Sequential Constant Size Compressors for Reinforcement Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
4
3
1

Relationship

6
2

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 15 publications
0
14
0
Order By: Relevance
“…Our RL RNN now outperform many previous methods on benchmarks [21], creating memories of important events and solving numerous tasks unsolvable by classical RL methods. Several best paper awards resulted from this research, e.g., [18,92].…”
Section: Recurrent / Deep Neural Networkmentioning
confidence: 99%
“…Our RL RNN now outperform many previous methods on benchmarks [21], creating memories of important events and solving numerous tasks unsolvable by classical RL methods. Several best paper awards resulted from this research, e.g., [18,92].…”
Section: Recurrent / Deep Neural Networkmentioning
confidence: 99%
“…TD [23], policy gradients [22], etc. ), is to combine action learning with an unsupervised learning (UL) preprocessor or "compressor" which provides a lower-dimensional feature vector that the agent receives as input instead of the raw observation [5,8,11,15,17,18,19]. The UL compressor is trained on the high-dimensional observations generated by the learning agent's actions, that the agent then uses as a state representation to learn a value function.…”
Section: Introductionmentioning
confidence: 99%
“…Up to now, this general approach of using UL as a preprocessor has been studied only in the context of single-agent RL (i.e. TD, policy gradients, etc), where the agent simultaneously learns both the mapping from the compressed features to actions, and the features themselves from the observations provoked by the actions [1,4,11,13,14,16]. A problem with this coupled system is that, since the learned features depend on how the agent behaves and vice-versa, good features may never form because the agent's initial, poor policy results in biased sampling of the environment.…”
Section: Introductionmentioning
confidence: 99%