Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch

Zhang, Shangtong; Combes, Remi Tachet des; Laroche, Romain

doi:10.48550/arxiv.2111.02997

Search citation statements

Order By: Relevance

Paper Sections

Select...

Related Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Non-asymptotic analyses for critic only methods have been extensively studied recently, e.g., TD Lakshminarayanan & Szepesvari, 2018;Bhandari et al, 2018;Cai et al, 2019;Sun et al, 2019;, SARSA (Zou et al, 2019), gradient TD (GTD) method (Dalal et al, 2018;Xu et al, 2019;Wang et al, 2021;2017;Liu et al, 2015;Gupta et al, 2019;Kaledin et al, 2020;Ma et al, 2020;Wang & Zou, 2020). There are also non-asymptotic analyses for actor only method, e.g., (Bhandari & Russo, 2021;Agarwal et al, 2021;Mei et al, 2020;Li et al, 2021a;Laroche & des Combes, 2021;Zhang et al, 2021;Cen et al, 2021;Zhang et al, 2020a;Lin, 2022). In this paper, we focus on AC and NAC algorithms, where how the errors in the actor and the critic affects the other needs to be analyzed.…”

Section: Related Workmentioning

confidence: 99%