Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Ramprasad, P.; Li, Yuantong; Yang, Zhuoran; Wang, Zhaoran; Sun, Will Wei; Cheng, Guang

doi:10.1080/01621459.2022.2096620

Cited by 7 publications

(9 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Theorem 4, we find that for each fraction r ∈ (0, 1], φ T (r) is the most efficient RAL estimator with its asymptotic variance matching the efficiency lower bound. This result answers an open question of efficiency in linear stochastic approximation raised by Ramprasad et al [2021] and provides evidence of the statistical optimality of the partial-sum process φ T in terms of asymptotic variance.…”

Section: Contributionsupporting

confidence: 63%

“…However, in asynchronous reinforcement learning (RL) [Tsitsiklis, 1994, Even-Dar et al, 2003, data is generated along a single Markov chain, precluding the use of stochastic optimization methods. Inspired by resampling-based inference methods in stochastic optimization, Bootstrap-based methods have been developed for linear policy evaluation tasks [White and White, 2010, Hanna et al, 2017, Hao et al, 2021, Ramprasad et al, 2021. However, they are not suitable for nonlinear tasks, such as quantifying randomness in the optimal value function.…”

Section: Stochastic Approximation On Markovian Datamentioning

confidence: 99%

“…In this setup, one can find that all of the imposed assumptions are satisfied, the update (2) reduces to x t = x t−1 − η t a t ( a t , x − y t ), and the confidence interval is given in (30). Here our target is to estimate and construct confidence intervals for θ x with θ = (1, data [Ramprasad et al, 2021]. This method approximates the distribution of xT by maintaining and bootstrapping…”

Section: Linear Regression With Autoregressive Noisesmentioning

confidence: 99%

“…To address these challenges, Ramprasad et al [2021] proposed an online bootstrap method in linear SA with Markovian data. This method maintains multiple perturbed iterates {x b t } b∈ [B] from which confidence intervals can be constructed by estimating the asymptotic variance or quantiles from its empirical distribution over b ∈ B.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Li¹,

Liang²,

Zhang³

2023

Preprint

View full text Add to dashboard Cite

We study the statistical inference of nonlinear stochastic approximation algorithms utilizing a single trajectory of Markovian data. Our methodology has practical applications in various scenarios, such as Stochastic Gradient Descent (SGD) on autoregressive data and asynchronous Q-Learning. By utilizing the standard stochastic approximation (SA) framework to estimate the target parameter, we establish a functional central limit theorem for its partial-sum process, φ T . To further support this theory, we provide a matching semiparametric efficient lower bound and a non-asymptotic upper bound on its weak convergence, measured in the Lévy-Prokhorov metric. This functional central limit theorem forms the basis for our inference method. By selecting any continuous scale-invariant functional f , the asymptotic pivotal statistic f (φ T ) becomes accessible, allowing us to construct an asymptotically valid confidence interval. We analyze the rejection probability of a family of functionals f m , indexed by m ∈ N, through theoretical and numerical means. The simulation results demonstrate the validity and efficiency of our method.

show abstract

Section: Contributionsupporting

confidence: 63%

Section: Stochastic Approximation On Markovian Datamentioning

confidence: 99%

Section: Linear Regression With Autoregressive Noisesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Li¹,

Liang²,

Zhang³

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…• Existing inference tools for online learning based on SGD or its variants (Fang, 2019;Fang et al, 2018;Ramprasad et al, 2022) are mainly designed for the finitedimensional setting. In contrast, our work deals with the infinite-dimensional setting.…”

Section: Introductionmentioning

confidence: 99%

Scalable inference in functional linear regression with streaming data

Xie¹,

Shi²,

Sang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Traditional static functional data analysis is facing new challenges due to streaming data, where data constantly flow in. A major challenge is that storing such an ever-increasing amount of data in memory is nearly impossible.In addition, existing inferential tools in online learning are mainly developed for finite-dimensional problems, while inference methods for functional data are focused on the batch learning setting. In this paper, we tackle these issues by developing functional stochastic gradient descent algorithms and proposing an online bootstrap resampling procedure to systematically study the inference problem for functional linear regression. In particular, the proposed estimation and inference procedures use only one pass over the data; thus they are easy to implement and suitable to the situation where data arrive in a streaming manner. Furthermore, we establish the convergence rate as well as the asymptotic distribution of the proposed estimator. Meanwhile, the proposed perturbed estimator from the bootstrap procedure is shown to enjoy the same theoretical properties, which provide the theoretical justification for our online inference tool. As far as we know, this is the first inference result on the functional linear regression model with streaming data. Simulation studies are conducted to investigate the finite-sample performance of the proposed procedure. An application is illustrated with the Beijing multi-site air-quality data.

show abstract

Deep spectral Q‐learning with application to mobile health

Gao

Shi

Song

2023

Stat

View full text Add to dashboard Cite

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time‐varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q‐learning algorithm, which integrates principal component analysis (PCA) with deep Q‐learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

show abstract

Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning

Cited by 7 publications

References 34 publications

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Online Statistical Inference for Nonlinear Stochastic Approximation with Markovian Data

Scalable inference in functional linear regression with streaming data

Deep spectral Q‐learning with application to mobile health

Contact Info

Product

Resources

About