Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Prashanth, L A; Korda, Nathaniel; Munos, Rémi

doi:10.1007/s10994-020-05912-5

Cited by 9 publications

(22 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, it is better to use the random variable Z as an unbiased estimate of μ than X. From the formula, it can be seen that the variance of Z is sufficiently small as long as the random variable X is guaranteed to show a certain correlation with Y [3], so Y is also called the control variable of X. This is the control variable method.…”

Section: Controlled Variable Methodsmentioning

confidence: 99%

“…For SGD variance problem, there are three mainstream methods to reduce the variance of sampling at present that include importance sampling, hierarchical sampling method and control variable method. The objective function in machine learning is usually solved using the Batch Gradient Descent (BGD) or SGD [3]. BGD algorithm computes the gradients of all samples for each iteration to perform the weight update, and the latter randomly selects one training sample at a time to update the parameters by computing the sample gradients.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

Pan¹,

Zheng²

2022

Computer Modeling in Engineering &Amp; Sciences

View full text Add to dashboard Cite

The machine learning model converges slowly and has unstable training since large variance by random using a sample estimate gradient in SGD. To this end, we propose a noise reduction method for Stochastic Variance Reduction gradient (SVRG), called N-SVRG, which uses small batches samples instead of all samples for the average gradient calculation, while performing an incremental update of the average gradient. In each round of iteration, a small batch of samples is randomly selected for the average gradient calculation, while the average gradient is updated by rounding of the past model gradients during internal iterations. By suitably reducing the batch size B, the memory storage as well as the number of iterations can be reduced. The experiments are compared with the state-of-the-art Mini-Batch SGD, AdaGrad, RMSProp, SVRG and SCSG, and it is demonstrated that N-SVRG outperforms SVRG and SASG, and is on par with SCSG. Finally, by exploring the relationship between the small values of different parameters n. B and k and the effectiveness of the algorithm, we prove that our N-SVRG algorithm has some stability and can achieve sufficient accuracy even in the case of small batch size. The advantages and disadvantages of various methods are experimentally compared, and the stability of N-SVRG is explored by parameter settings.

show abstract

Section: Controlled Variable Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

Pan¹,

Zheng²

2022

Computer Modeling in Engineering &Amp; Sciences

View full text Add to dashboard Cite

show abstract

“…) in (33) by using arguments similar to those used in arriving at Eq. (79) in [37]. In particular, the latter bound uses Jensen's inequality and the convexity of f (x) = x −2α exp(x 1−α ).…”

Section: A3 Proof Of Theoremmentioning

confidence: 99%

Online Estimation and Optimization of Utility-Based Shortfall Risk

S.¹,

Prashanth²,

Jagannathan³

2021

Preprint

View full text Add to dashboard Cite

Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingly popular in financial applications, owing to certain desirable properties that it enjoys. We consider the problem of estimating UBSR in a recursive setting, where samples from the underlying loss distribution are available one-at-a-time. We cast the UBSR estimation problem as a root finding problem, and propose stochastic approximation-based estimations schemes. We derive non-asymptotic bounds on the estimation error in the number of samples. We also consider the problem of UBSR optimization within a parameterized class of random variables. We propose a stochastic gradient descent based algorithm for UBSR optimization, and derive non-asymptotic bounds on its convergence.

show abstract

“…Analysis of TD algorithms is challenging, and researchers have devoted significant effort in studying its asymptotic properties [7,11,15,19]. In recent years, there has been an interest in characterising the finite-time behaviour of TD, and several papers [1,2,3,9,13] have tackled this problem under various assumptions. For iterations/updates, most existing works either provide a 1 (with universal step-size) [1,3] or a 1 (with constant step size) [1,9,13] convergence rate to the TD-fixed point ★ defined as ★ −1 (see Section 2 for the notational information).…”

Section: Introductionmentioning

confidence: 99%

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Patil¹,

Prashanth²,

Nagaraj³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal (1/ ) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.

show abstract

Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling

Cited by 9 publications

References 22 publications

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples

Online Estimation and Optimization of Utility-Based Shortfall Risk

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Contact Info

Product

Resources

About