Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143861
|View full text |Cite
|
Sign up to set email alerts
|

Learning predictive state representations using non-blind policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 21 publications
(23 citation statements)
references
References 6 publications
0
23
0
Order By: Relevance
“…[16,3]) and OOMs (e.g. [10]), none have been shown to learn models that are accurate enough for lookahead planning.…”
Section: Closing the Learning-planning Loop With Psrsmentioning
confidence: 99%
“…[16,3]) and OOMs (e.g. [10]), none have been shown to learn models that are accurate enough for lookahead planning.…”
Section: Closing the Learning-planning Loop With Psrsmentioning
confidence: 99%
“…We select an action a j+1 according to this distribution, we execute this action, and then we use this new information (o j+1 , a j+1 ) to update our core test probabilities by using Equation 4. We add the event (o j+1 , a j+1 ) at the end of the history h j , which becomes h j+1 , and we repeat the same process for the next steps.…”
Section: Predictive Policy Representations (Pprs)mentioning
confidence: 99%
“…if the column of oaq inP is linearly dependent on 4 the columns of the core tests q ∈ Q; then 5 Find the vector m oaq by solving the linear…”
Section: Learning Pprs Vs Learning Fscsmentioning
confidence: 99%
“…Prior work on learning predictive models have mostly taken the approach of learning a complete model of the system with the objective of obtaining good predictive accuracy for all possible future behaviors, then planning with the learned model. Thus they either assume the training data is acquired through interaction with the environment using a purely exploratory policy [4], or else side-step the exploration problem by learning from a batch of pre-sampled action-observation trajectories [2].…”
Section: Introductionmentioning
confidence: 99%