2016
DOI: 10.48550/arxiv.1602.03943
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Abstract: First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity. Second-order methods, while able to provide faster convergence, have been much less explored due to the high cost of computing the second-order information. In this paper we develop second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
56
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(57 citation statements)
references
References 9 publications
(13 reference statements)
1
56
0
Order By: Relevance
“…This provides evidence to the necessity of adaptive sampling schemes, and a dimension-dependent analysis, which indeed accords with some recently proposed algorithms and derivations, e.g. Agarwal et al [2016], Xu et al [2016]. We note that the limitations arising from oblivious optimization schemes (in a somewhat stronger sense) was also explored in Arjevani and Shamir [2016a,b].…”
Section: Introductionsupporting
confidence: 84%
See 2 more Smart Citations
“…This provides evidence to the necessity of adaptive sampling schemes, and a dimension-dependent analysis, which indeed accords with some recently proposed algorithms and derivations, e.g. Agarwal et al [2016], Xu et al [2016]. We note that the limitations arising from oblivious optimization schemes (in a somewhat stronger sense) was also explored in Arjevani and Shamir [2016a,b].…”
Section: Introductionsupporting
confidence: 84%
“…Therefore, we conjecture that this approach cannot lead to better worst-case results. Agarwal et al [2016] develop another line of stochastic second-order methods, which are based on the observation that the Newton step (∇ 2 F (w)) −1 ∇F (w) is the solution of the system of linear equations…”
Section: Comparison To Existing Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…Fortunately, we do not need to calculate the full Hessian, only Hessian-vector products [e.g., H −1 θ ∇L r in Eqs. (1) and (3)] or the top part of the Hessian spectrum, which significantly reduces the computational complexity of the problem [65]. The inverse of the Hessian for influence functions, RUE, and RelatIF can be approximated with so-called stochastic approximation with LiSSA [65].…”
Section: F Practical Aspects Of the Hessian Computationmentioning
confidence: 99%
“…(1) and (3)] or the top part of the Hessian spectrum, which significantly reduces the computational complexity of the problem [65]. The inverse of the Hessian for influence functions, RUE, and RelatIF can be approximated with so-called stochastic approximation with LiSSA [65]. Additionally, the authors of RelatIF approximated the normalization factor in Eq.…”
Section: F Practical Aspects Of the Hessian Computationmentioning
confidence: 99%