2010
DOI: 10.1007/978-3-642-16108-7_29
|View full text |Cite
|
Sign up to set email alerts
|

Consistency of Feature Markov Processes

Abstract: We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 12 publications
0
8
0
Order By: Relevance
“…The predictive state representation [51] approach also lacks a general and principled learning algorithm. In contrast, initial consistency results for MDP show that under some assumptions, MDP agents asymptotically learn the correct underlying MDP [80].…”
Section: Feature Reinforcement Learningmentioning
confidence: 90%
“…The predictive state representation [51] approach also lacks a general and principled learning algorithm. In contrast, initial consistency results for MDP show that under some assumptions, MDP agents asymptotically learn the correct underlying MDP [80].…”
Section: Feature Reinforcement Learningmentioning
confidence: 90%
“…FRL starts with a class of maps Φ, compares different φ ∈ Φ, and selects the most appropriate one given the experience h t so far. Several criteria based on how well φ reduces P to an MDP have been devised [Hut09b,Hut09a] and theoretically [SH10] and experimentally [NSH11] investigated [Ngu13]. Theorems 5-9 show that demanding P φ to be approximately MDP is overly restrictive.…”
Section: Feature Reinforcement Learningmentioning
confidence: 99%
“…The interested reader is referred to [14] for more detailed analytical formulas, and [26] for further motivation and consistency proofs of the ΦMDP model.…”
Section: Cost Functionmentioning
confidence: 99%
“…The recently introduced Feature Markov Decision Process (ΦMDP) framework [14] attempts to reduce actual RL tasks to MDPs for the purpose of attacking the general RL problem where the environment's model as well as the set of states are unknown. In [26], Sunehag and Hutter take a step further in the theoretical investigation of Feature Reinforcement Learning by proving consistency results. In this article, we develop an actual Feature Reinforcement Learning algorithm and empirically analyze its performance in a number of environments.…”
Section: Introductionmentioning
confidence: 99%