On the Goldstein-Levitin-Polyak gradient projection method

Bertsekas,

doi:10.1109/cdc.1974.270399

Cited by 80 publications

(121 citation statements)

References 0 publications

Supporting

Mentioning

121

Contrasting

Order By: Relevance

“…Letd k be the minimizer of (3). (This minimizer exists and is unique by the strict convexity of the subproblem (3), but we will see later that we do not need to compute it.)…”

Section: Algorithm 21: Inexact Variable Metric Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Inexact spectral projected gradient methods on convex sets

Birgin¹

2003

IMA Journal of Numerical Analysis

163

178

View full text Add to dashboard Cite

A new method is introduced for large scale convex constrained optimization. The general model algorithm involves, at each iteration, the approximate minimization of a convex quadratic on the feasible set of the original problem and global convergence is obtained by means of nonmonotone line searches. A specific algorithm, the Inexact Spectral Projected Gradient method (ISPG), is implemented using inexact projections computed by Dykstra's alternating projection method and generates interior iterates. The ISPG method is a generalization of the Spectral Projected Gradient method (SPG), but can be used when projections are difficult to compute. Numerical results for constrained least-squares rectangular matrix problems are presented.

show abstract

“…Letd k be the minimizer of (3). (This minimizer exists and is unique by the strict convexity of the subproblem (3), but we will see later that we do not need to compute it.)…”

Section: Algorithm 21: Inexact Variable Metric Methodsmentioning

confidence: 99%

“…The SPG method is related to the practical version of Bertsekas [3] of the classical gradient projected method of Goldstein, Levitin and Polyak [21,25]. However, some critical differences make this method much more efficient than its gradient-projection predecessors.…”

Section: Introductionmentioning

confidence: 99%

Inexact spectral projected gradient methods on convex sets

Birgin¹

2003

IMA Journal of Numerical Analysis

163

178

View full text Add to dashboard Cite

show abstract

“…The last term on the right hand side, 5) measures the error in the Hessian approximation, along the direction d, due to the use of a smaller sample H k . It is apparent from (5.4) that it is inefficient to require that the residual r k be significantly smaller than the Hessian approximation error ∆ H k (w k ; d), as the extra effort in solving the linear system may not lead to an improved search direction for the objective function J S k (w).…”

Section: The Conjugate Gradient Iterationmentioning

confidence: 99%

Sample size selection in optimization methods for machine learning

et al. 2012

View full text Add to dashboard Cite

This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an O(1/ ) complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1 regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

show abstract

“…However, there is a significant difference with other CRF and HCRF models that use such techniques to find optimal parameters: we are constrained to only positive θ-parameters Since we are using a quasi-Newton method with Armijo backtracking line search, we can use the gradient projection method of [12,13] to enforce this constrain. Finally, it is important to stress here that, although our model includes parameters that are not treated probabilistically, we have not seen signs of overfitting in our experiments (see Fig.…”

Section: Model Trainingmentioning

confidence: 99%

Variational Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures

Bousmalis

Zafeiriou

Morency

et al. 2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. Hidden Conditional Random Fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An infinite HCRF is an HCRF with a countably infinite number of hidden states, which rids us not only of the necessity to specify a priori a fixed number of hidden states available but also of the problem of overfitting. Markov chain Monte Carlo (MCMC) sampling algorithms are often employed for inference in such models. However, convergence of such algorithms is rather difficult to verify, and as the complexity of the task at hand increases, the computational cost of such algorithms often becomes prohibitive. These limitations can be overcome by variational techniques. In this paper, we present a generalized framework for infinite HCRF models, and a novel variational inference approach on a model based on coupled Dirichlet Process Mixtures, the HCRF-DPM. We show that the variational HCRF-DPM is able to converge to a correct number of represented hidden states, and performs as well as the best parametric HCRFs -chosen via cross-validation-for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.

show abstract

On the Goldstein-Levitin-Polyak gradient projection method

Cited by 80 publications

References 0 publications

Inexact spectral projected gradient methods on convex sets

Inexact spectral projected gradient methods on convex sets

Sample size selection in optimization methods for machine learning

Variational Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures

Contact Info

Product

Resources

About