2013
DOI: 10.1007/s10994-013-5368-1
|View full text |Cite
|
Sign up to set email alerts
|

Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model

Abstract: We consider the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs). We prove new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP: value iteration and policy iteration. The first result indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O(N log(N/δ)/((1 − γ) 3 ε 2)) state-transition samp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

12
171
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 119 publications
(184 citation statements)
references
References 12 publications
(10 reference statements)
12
171
1
Order By: Relevance
“…It is interesting to compare the upper bound (27) with a worst case upper bound on Σ PolOpt pr, P P P, γq. In light of Lemma 7 from the paper [AMK13], assuming R max ď 1, and }r} 8 ď 1 for simplicity, we have Σ PolOpt pr, P P P, γq ď 1 p1 ´γq 1.5 .…”
Section: A Conservative Yet Useful Upper Boundmentioning
confidence: 98%
See 2 more Smart Citations
“…It is interesting to compare the upper bound (27) with a worst case upper bound on Σ PolOpt pr, P P P, γq. In light of Lemma 7 from the paper [AMK13], assuming R max ď 1, and }r} 8 ď 1 for simplicity, we have Σ PolOpt pr, P P P, γq ď 1 p1 ´γq 1.5 .…”
Section: A Conservative Yet Useful Upper Boundmentioning
confidence: 98%
“…Much of the focus in the past has been on understanding TD-type algorithms with instancedependent analyses: function approximation under the 2 error [BRS18, DBGM18, XWZL20], tabular setting under the 8 -error [KXWJ21,PW21], or under kernel function approximation [DWW21]. Many of these results established instance-specific guarantees that improve upon global worst-case bounds [AMK13]. In particular, the paper [KPR `21] establishes a local minimax lower-bound in the tabular setting and proposes a procedure that achieves it.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In 1981, Chile privatized its traditional public pension system and created a fully funded (FF) private pension system (PPS) with pension fund managers, Administradoras de Fondos de Pensiones (AFP), that were exclusively designated to invest workers' retirement savings (Holzmann & Stiglitz, 2001;Piñera, 1991). The Chilean reform shut down the national public pension system (NPS) 3 and diverged from the International Labour Organization's (ILO) social protection principles (ILO, 1952(ILO, , 2012a(ILO, , 2012b. The impetus for similar reforms in other Latin American countries was supported by the World Bank (WB) and the Inter-American Development Bank (IADB), in efforts to reduce the public pension debt due to aging and to address the sustainability of pay-as-you-go (PAYG) (World Bank, 1994).…”
Section: Structural Reforms and Re-reforms In Latin Americamentioning
confidence: 99%
“…In Latin America, 13 countries, including Peru, have adopted social pensions to mitigate poverty in old age (ECLAC, 2019). Overall, these transfers help pay living expenses, but they do not ensure the wellness of older people (ILO, 2014;Olivera, 2016b;Rofman, Apella, & Vezza, 2014). Such individuals' needs go beyond income and include access to health care, optimal housing, transportation, and other services (UN, 2002).…”
Section: Structural Reforms and Re-reforms In Latin Americamentioning
confidence: 99%