2020
DOI: 10.48550/arxiv.2010.08843
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Approximate information state for approximate planning and reinforcement learning in partially observed systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…Many other IPMs have been considered in the literature including Kolmogorov distance, bounded Lipschitz metric, and maximum mean discrepancy. See, for example, ; Subramanian et al (2020). The choice of the metric often depends on the specific properties of the model.…”
Section: Approximate Gamementioning
confidence: 99%
“…Many other IPMs have been considered in the literature including Kolmogorov distance, bounded Lipschitz metric, and maximum mean discrepancy. See, for example, ; Subramanian et al (2020). The choice of the metric often depends on the specific properties of the model.…”
Section: Approximate Gamementioning
confidence: 99%
“…For now we note that here the conditions we set for the action compression from Ω(Γ t ) to Ω( Λ t ) are on the private states instead of defining an encapsulation directly on the actions (i.e., prescriptions); moreover, the compression may depend on the common state h 0 t as well. Hence, this falls outside of the action compression scheme studied in Subramanian et al [2020]. We bound the error between the value functions obtained from Algorithm 2 and the optimal value functions obtained from Algorithm 1 in the following theorem proved in Section 4.2.…”
Section: Compressing Private Statesmentioning
confidence: 99%
“…In Kara and Yuksel [2020] consider a special type of AIS -the N -memory, which contains the information from the last N steps. Here, the compression function is fixed but in contrast to Subramanian et al [2020], the approximation error given each history need not be uniform. When the model is known, they provide conditions that bound the regret of N -memory policies (policies that depend on N -memory), and an algorithm that finds optimal policies within this class.…”
Section: A Supplementary Detailsmentioning
confidence: 99%
See 2 more Smart Citations