Optimal variance-reduced stochastic approximation in Banach spaces

Mou, Wenlong; Khamaru, Koulik; Wainwright, Martin J.; Jordan, Michael I.

doi:10.48550/arxiv.2201.08518

Cited by 3 publications

(9 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The growth condition requires the incremental update H(x, ξ) to grow at most linearly in both x and a non-negative function g : Ξ → R that captures the contribution of data ξ to the norm growth of H(x, ξ) . It would be emphasized that we assume {g(ξ t )} t≥0 has uniformly bounded p-th moments, much milder than previous almost surely uniformly boundedness [Chen et al, 2021c, Doan et al, 2020, Mou et al, 2022a.…”

Section: Consistency Guaranteementioning

confidence: 99%

“…A sufficient condition for (3) is almost surely Lipschitz continuity, meaning that |H(x, ξ) − H(y, ξ)| ≤ L H |x − y| holds for any x, y ∈ R d and ξ ∈ Ξ. This type of condition is commonly used in machine learning, as demonstrated by the A2 condition in [Mou et al, 2022a].…”

Section: Assumption 1 (Local Linearity) There Exist Constantsmentioning

confidence: 99%

“…Over the past two decades, SA has gained significant attention, driven by applications in reinforcement learning and stochastic optimization [Borkar, 2009, Moulines and Bach, 2011, Meyn, 2022, Borkar et al, 2021. Despite the numerous SA methods developed and even the establishment of minimax optimal instance-dependent estimation bounds [Moulines and Bach, 2011, Mou et al, 2020, Li et al, 2023, Mou et al, 2022a, there is still a need for methods and theories that quantify estimation uncertainty and provide precise procedures for constructing confidence intervals.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Li¹,

Liang²,

Zhang³

2023

Preprint

View full text Add to dashboard Cite

We study the statistical inference of nonlinear stochastic approximation algorithms utilizing a single trajectory of Markovian data. Our methodology has practical applications in various scenarios, such as Stochastic Gradient Descent (SGD) on autoregressive data and asynchronous Q-Learning. By utilizing the standard stochastic approximation (SA) framework to estimate the target parameter, we establish a functional central limit theorem for its partial-sum process, φ T . To further support this theory, we provide a matching semiparametric efficient lower bound and a non-asymptotic upper bound on its weak convergence, measured in the Lévy-Prokhorov metric. This functional central limit theorem forms the basis for our inference method. By selecting any continuous scale-invariant functional f , the asymptotic pivotal statistic f (φ T ) becomes accessible, allowing us to construct an asymptotically valid confidence interval. We analyze the rejection probability of a family of functionals f m , indexed by m ∈ N, through theoretical and numerical means. The simulation results demonstrate the validity and efficiency of our method.

show abstract

Section: Consistency Guaranteementioning

confidence: 99%

Section: Assumption 1 (Local Linearity) There Exist Constantsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Li¹,

Liang²,

Zhang³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Consequently, existing onpolicy first-order policy optimization methods require restrictive assumptions, e.g., all the iterates of policies are random enough, which is problematic when the optimal policy does not possess this structure (e.g., being deterministic). In spite of intensive research effort of on-policy evaluation (Tsitsiklis and Van Roy, 1999;Yu and Bertsekas, 2009;Zhang et al, 2021b;Mou et al, 2022), one seemly unresolved problem in RL is whether one can design sampling-efficient on-policy evaluation algorithms for insufficiently random policies and use them for policy optimization (see Remark 1 of Lan, 2022).…”

Section: Introductionmentioning

confidence: 99%

“…In addition, they establish convergence analysis in 2 -norm, which is not a natural metric for the underlying problem, thus leading to worse dependence on other problem parameters, i.e., dimension of the transition kernel. It is also noteworthy that recent work by Mou et al (2022) proposed a variance-reduced stochastic approximation approach that solves the AMDP policy evaluation problem in span semi-norm under the generative model. However, their results do not directly extend to the Markovian noise setting.…”

mentioning

confidence: 99%

Stochastic first-order methods for average-reward Markov decision processes

Li¹,

Wu²,

Lan³

2022

Preprint

View full text Add to dashboard Cite

We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with optimal convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first O( −2 ) sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to O( −1 ) for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.

show abstract

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

Chen¹,

Maguluri²,

Zubeldia³

2023

Preprint

View full text Add to dashboard Cite

In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality.To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and Q-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise. * Equal contribution.Early literature on SA focused on the asymptotic convergence, i.e., the behavior of x k as k goes to infinity (

show abstract

Optimal variance-reduced stochastic approximation in Banach spaces

Cited by 3 publications

References 27 publications

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Stochastic first-order methods for average-reward Markov decision processes

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

Contact Info

Product

Resources

About