2020
DOI: 10.48550/arxiv.2010.02383
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Randomized Value Functions via Posterior State-Abstraction Sampling

Abstract: State abstraction has been an essential tool for dramatically improving the sample efficiency of reinforcement-learning algorithms. Indeed, by exposing and accentuating various types of latent structure within the environment, different classes of state abstraction have enabled improved theoretical guarantees and empirical performance. When dealing with state abstractions that capture structure in the value function, however, a standard assumption is that the true abstraction has been supplied or unrealistical… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 26 publications
0
3
0
Order By: Relevance
“…Indeed, algorithms like MuZero and its predecessors [Silver et al, 2017, Oh et al, 2017, Schrittwieser et al, 2020 never approximate reward functions and transition models with respect to the raw image observations generated by the environment, but instead incrementally learn some latent representation of state upon which a corresponding model is approximated for planning. This philosophy is born out of several years of work that elucidate the important of state abstraction as a key tool for avoiding the irrelevant information encoded in environment states and addressing the challenge of generalization for sample-efficient reinforcement learning large-scale environments [Whitt, 1978, Bertsekas and Castañon, 1989, Dean and Givan, 1997, Ferns et al, 2004, Jong and Stone, 2005, Li et al, 2006, Van Roy, 2006, Ferns et al, 2012, Jiang et al, 2015, Abel et al, 2016, 2018, Dong et al, 2019, Du et al, 2019, Arumugam and Van Roy, 2020, Misra et al, 2020, Agarwal et al, 2020, Abel et al, 2020, Abel, 2020, Dong et al, 2021. In this section, we briefly introduce a small extension of VSRL that builds on these insights to accommodate lossy MDP compressions defined on a simpler, abstract state space (also referred to as aleatoric or situational state by Lu et al [2021], Dong et al [2021]).…”
Section: Greater Compression Via State Abstractionmentioning
confidence: 99%
“…Indeed, algorithms like MuZero and its predecessors [Silver et al, 2017, Oh et al, 2017, Schrittwieser et al, 2020 never approximate reward functions and transition models with respect to the raw image observations generated by the environment, but instead incrementally learn some latent representation of state upon which a corresponding model is approximated for planning. This philosophy is born out of several years of work that elucidate the important of state abstraction as a key tool for avoiding the irrelevant information encoded in environment states and addressing the challenge of generalization for sample-efficient reinforcement learning large-scale environments [Whitt, 1978, Bertsekas and Castañon, 1989, Dean and Givan, 1997, Ferns et al, 2004, Jong and Stone, 2005, Li et al, 2006, Van Roy, 2006, Ferns et al, 2012, Jiang et al, 2015, Abel et al, 2016, 2018, Dong et al, 2019, Du et al, 2019, Arumugam and Van Roy, 2020, Misra et al, 2020, Agarwal et al, 2020, Abel et al, 2020, Abel, 2020, Dong et al, 2021. In this section, we briefly introduce a small extension of VSRL that builds on these insights to accommodate lossy MDP compressions defined on a simpler, abstract state space (also referred to as aleatoric or situational state by Lu et al [2021], Dong et al [2021]).…”
Section: Greater Compression Via State Abstractionmentioning
confidence: 99%
“…As numerous sample-efficiency guarantees in reinforcement learning [Kearns and Singh, 2002, Kakade et al, 2003b, Strehl et al, 2009 bear a dependence on the size of the MDP state space, |S|, a large body of work has entertained state abstraction as a tool for improving the dependence on state space size without compromising performance [Whitt, 1978, Bertsekas et al, 1988, Singh et al, 1995, Gordon, 1995, Tsitsiklis and Van Roy, 1996, Dean and Givan, 1997, Ferns et al, 2004, Jong and Stone, 2005, Li et al, 2006, Van Roy, 2006, Ferns et al, 2012, Jiang et al, 2015a, Abel et al, 2016, Dong et al, 2019, Du et al, 2019, Misra et al, 2020, Arumugam and Van Roy, 2020, Abel, 2020. Broadly speaking, a state abstraction φ : S → S φ maps original or ground states of the MDP into abstract states in S φ .…”
Section: State Abstraction In Mdpsmentioning
confidence: 99%
“…Like TS, IDS also selects at random based on the posterior belief, but constructs the distribution from which this sample is drawn based on a trade-off of expected regret, and expected information gain given the feedback on the upcoming action. IDS (and frequentist approximations thereof) has been applied to certain bandit, partial monitoring, and reinforcement learning problems (Liu et al, 2018;Kirschner and Krause, 2018;Kirschner et al, 2020a,b;Arumugam and Van Roy, 2020) and shown strong empirical and theoretical results, comprable to those for TS. We discuss the use of IDS for apple tasting in Section 4, and evaluate it empirically alongside TS in Section 5.…”
Section: Related Literaturementioning
confidence: 99%