A Stochastic Approximation Algorithm with Step-Size Adaptation

Plakhov, Alexander; Cruz, Pedro

doi:10.1023/b:joth.0000013559.37579.b2

Cited by 18 publications

(10 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The parameter β > 0 is fixed at the beginning of each run, as discussed below, and the SQN method is implemented as described in Algorithm 1. It is well known amongst the optimization and machine learning communities that the SGD method can be improved by choosing the parameter β via a set of problem dependent heuristics [19,27]. In some cases, β k (rather than β) is made to vary during the course of the iteration, and could even be chosen so that β k /k is constant, in which case only convergence to a neighborhood of the solution is guaranteed [15].…”

Section: Numerical Experimentsmentioning

confidence: 99%

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Byrd¹,

Hansen²,

Nocedal³

et al. 2016

SIAM J. Optim.

319

299

View full text Add to dashboard Cite

The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi-Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients at every iteration, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that suggest that the proposed method shows much promise.

show abstract

Section: Numerical Experimentsmentioning

confidence: 99%

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Byrd¹,

Hansen²,

Nocedal³

et al. 2016

SIAM J. Optim.

319

299

View full text Add to dashboard Cite

show abstract

“…Our multi‐level step‐size adaptation idea is inspired by Plakhov and Cruz [28] and Klein et al . [29]: if two consecutive gradients

normal∇ {scriptF}_{t - 1}

and

normal∇ {scriptF}_{t}

are in the same direction, i.e.…”

Section: Adaptive Sgd On the Grassmannianmentioning

confidence: 99%

“…The most important parameter of GASG21 is μ max which controls how fast GASG21 goes to next level . GASG21 does not generate the actual step‐size η j by the adaptive step‐size framework [28, 29], rather the adaptive step‐size framework is used to generate the important sequence μ j . According to Klein et al .…”

Section: Adaptive Sgd On the Grassmannianmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive stochastic gradient descent on the Grassmannian for robust low‐rank subspace recovery

Zhang

Zhou

et al. 2016

IET signal process.

View full text Add to dashboard Cite

In this paper, we present GASG21 (Grassmannian Adaptive Stochastic Gradient for L2,1 norm minimization), an adaptive stochastic gradient algorithm to robustly recover the low-rank subspace from a large matrix. In the presence of column outliers, we reformulate the batch mode matrix L2,1 norm minimization with rank constraint problem as a stochastic optimization approach constrained on Grassmann manifold. For each observed data vector, the low-rank subspace S is updated by taking a gradient step along the geodesic of Grassmannian. In order to accelerate the convergence rate of the stochastic gradient method, we choose to adaptively tune the constant step-size by leveraging the consecutive gradients. Furthermore, we demonstrate that with proper initialization, the K-subspaces extension, K-GASG21, can robustly cluster a large number of corrupted data vectors into a union of subspaces. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed algorithms even with heavy column outliers corruption.

show abstract

“…If the gradients point in opposite directions, the step size is reduced. The theoretical convergence properties of the method in one-dimensional (P = 1) optimisation problems were studied by Plakhov and Cruz (2004). Cruz (2005a) extended the analysis to multidimensional (P > 1) problems.…”

Section: (μ) ≡ C(f M • T )mentioning

confidence: 99%

Adaptive Stochastic Gradient Descent Optimisation for Image Registration

et al. 2008

View full text Add to dashboard Cite

Document VersionPublisher's PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.• You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Abstract We present a stochastic gradient descent optimisation method for image registration with adaptive step size prediction. The method is based on the theoretical work by Plakhov and Cruz (J. Math. Sci. 120(1): [964][965][966][967][968][969][970][971][972][973] 2004). Our main methodological contribution is the derivation of an image-driven mechanism to select proper values for the most important free parameters of the method. The selection mechanism employs general characteristics of the cost functions that commonly occur in intensity-based image registration. Also, the theoretical convergence conditions of the optimisation method are taken into account. The proposed adaptive stochastic gradient descent (ASGD) method is compared to a standard, non-adaptive RobbinsMonro (RM) algorithm. Both ASGD and RM employ a stochastic subsampling technique to accelerate the optimisation process. Registration experiments were performed on 3D CT and MR data of the head, lungs, and prostate, using various similarity measures and transformation models. The results indicate that ASGD is robust to these variations in the registration framework and is less sensitive to the settings of the user-defined parameters than RM. The main disadvantage of RM is the need for a predetermined step size function. The ASGD method provides a solution for that issue.

show abstract

A Stochastic Approximation Algorithm with Step-Size Adaptation

Cited by 18 publications

References 7 publications

A Stochastic Quasi-Newton Method for Large-Scale Optimization

A Stochastic Quasi-Newton Method for Large-Scale Optimization

Adaptive stochastic gradient descent on the Grassmannian for robust low‐rank subspace recovery

Adaptive Stochastic Gradient Descent Optimisation for Image Registration

Contact Info

Product

Resources

About