2012
DOI: 10.1007/s10107-012-0572-5
|View full text |Cite
|
Sign up to set email alerts
|

Sample size selection in optimization methods for machine learning

Abstract: This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an O(1/ ) complexity bound on the total cost of a gradient method. The second part of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
278
1

Year Published

2012
2012
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 281 publications
(279 citation statements)
references
References 27 publications
(43 reference statements)
0
278
1
Order By: Relevance
“…Finally, Byrd et al (2012) has developed methods to deal with the mini-batch overfitting problem, which are based on heuristics that increase the mini-batch size and also terminate CG early, according to estimates of the variance of the gradient and curvature-matrix vector products. While this is a potentially effective approach (which we don't have experience with), there are several problems with it, in theory.…”
Section: Mini-batch Overfitting and Methods To Combat Itmentioning
confidence: 99%
“…Finally, Byrd et al (2012) has developed methods to deal with the mini-batch overfitting problem, which are based on heuristics that increase the mini-batch size and also terminate CG early, according to estimates of the variance of the gradient and curvature-matrix vector products. While this is a potentially effective approach (which we don't have experience with), there are several problems with it, in theory.…”
Section: Mini-batch Overfitting and Methods To Combat Itmentioning
confidence: 99%
“…If {p i } is finite, then this result trivially follows. Otherwise, since δ k → 0, we have that for sufficiently large p i , δ p i < b, with b defined by (5). Since ∇f (x p i ) > , and m p i is fully linear on B(x p i , ∆ p i ), then by the derivations in Theorem 4.2, we have g p i ≥ 2 , and by Lemma 3.2 ρ p i ≥ η 1 .…”
Section: The Lim-type Convergencementioning
confidence: 96%
“…The resulting methods may be very simple and enjoy low per-iteration complexity, but the practical performance of these approaches can be very poor. On the other hand, it was noted in [5] that the performance of stochastic gradient methods for large-scale machine learning improves substantially if the sample size is increased during the optimization process. Within direct search, the use of random positive spanning sets has also been recently investigated [1,34] with gains in performance and convergence theory for nonsmooth problems.…”
Section: Motivationmentioning
confidence: 99%
“…In [4] an adaptive sample size strategy was proposed in the setting where ∇f (x) = N i=1 ∇f i (x), for large values of N . In this case computing ∇f (x) accurately can be prohibitive, hence, instead an estimate ∇f S (x) = i∈S ∇f i (x) is often computed in hopes that it provides a good estimate of the gradient and a descent direction.…”
Section: Stochastic Gradients and Batch Samplingmentioning
confidence: 99%