2012
DOI: 10.1007/s12064-011-0142-z
|View full text |Cite
|
Sign up to set email alerts
|

An information-theoretic approach to curiosity-driven reinforcement learning

Abstract: We provide a fresh look at the problem of exploration in reinforcement learning, drawing on ideas from information theory. First, we show that Boltzmann-style exploration, one of the main exploration methods used in reinforcement learning, is optimal from an information-theoretic point of view, in that it optimally trades expected return for the coding cost of the policy. Second, we address the problem of curiosity-driven learning. We propose that, in addition to maximizing the expected return, a learner shoul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
133
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 142 publications
(140 citation statements)
references
References 21 publications
1
133
0
Order By: Relevance
“…Heuristically, this means agents will try to avoid uninformative (low entropy) outcomes (e.g., closing one's eyes) while avoiding states that produce ambiguous (high-entropy) outcomes (e.g., a noisy discotheque) (Schwartenbeck, . This resolution of uncertainty is closely related to satisfying artificial curiosity (Schmidhuber, 1991;Still & Precup, 2012) and speaks to the value of information (Howard, 1966). It is also referred to as intrinsic value (see Barto, Singh, & Chentanez, 2004) for a discussion of intrinsically motivated learning).…”
Section: Free Energy and Expected Free Energymentioning
confidence: 97%
“…Heuristically, this means agents will try to avoid uninformative (low entropy) outcomes (e.g., closing one's eyes) while avoiding states that produce ambiguous (high-entropy) outcomes (e.g., a noisy discotheque) (Schwartenbeck, . This resolution of uncertainty is closely related to satisfying artificial curiosity (Schmidhuber, 1991;Still & Precup, 2012) and speaks to the value of information (Howard, 1966). It is also referred to as intrinsic value (see Barto, Singh, & Chentanez, 2004) for a discussion of intrinsically motivated learning).…”
Section: Free Energy and Expected Free Energymentioning
confidence: 97%
“…This means that the most probable policies or paths are those that resolve uncertainty when navigating the lived world (Berlyne, 1950;Schmidhuber, 2006;Baranes and Oudeyer, 2009;Still and Precup, 2012;Barto et al, 2013;Moulin and Souchay, 2015). To achieve this, agents engage in some interactions that serve an epistemic rather than pragmatic purpose, i.e., epistemic actions (Kirsh and Maglio, 1994).…”
Section: Free Energy Revisitedmentioning
confidence: 99%
“…It has been argued that predictive information reveals the causal structure of the physical process that generated the observed signal [21,26,28,36]. This approach has been useful in dynamical systems theory and chaos theory, as well as neuroscience and machine learning ( [14,16,17,20,21,23,28,29,36,37] and refs. therein).…”
Section: Information Theoretic Treatment Of Predictionmentioning
confidence: 99%
“…The data to be compressed, or summarized, are past experiences, and the summary should be useful for predicting future experiences. We can thus identify relevant information as information about future data [12][13][14][15][16][17][18][19][20]. The use of a model that is not overly complex is related to the goal of compression in communication.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation