Abstract-This paper deals with sparse feature selection and grouping for classification and regression. The classification or regression problems under consideration consists in minimizing a convex empirical risk function subject to an ℓ 1 constraint, a pairwise ℓ ∞ constraint, or a pairwise ℓ 1 constraint. Existing work, such as the Lasso formulation, has focused mainly on Lagrangian penalty approximations, which often require ad hoc or computationally expensive procedures to determine the penalization parameter. We depart from this approach and address the constrained problem directly via a splitting method. The structure of the method is that of the classical gradientprojection algorithm, which alternates a gradient step on the objective and a projection step onto the lower level set modeling the constraint. The novelty of our approach is that the projection step is implemented via an outer approximation scheme in which the constraint set is approximated by a sequence of simple convex sets consisting of the intersection of two half-spaces. Convergence of the iterates generated by the algorithm is established for a general smooth convex minimization problem with inequality constraints. Experiments on both synthetic and biological data show that our method outperforms penalty methods.
Efficient Bigdata classification requires low cost learning methods. A standard approach involves Stochastic Gradient Descent algorithm (SGD) for the minimization of the Hinge Loss in the primal space. Although complexity of Stochastic Gradient Descent is linear with the number of samples these method suffers from slow convergence. In order to cope with this issue, we propose here a Boosting Stochastic Newton Descent (BSND) method for minimization of any calibrated loss in the primal space. BSND approximates the inverse Hessian by the best low-rank approximation. We validate BSND by benchmarking it against several variants of the state-of-the-art SGD algorithm on the the large scale ImageNet and Higgs dataset. We provide further core optimization for fast convergence. The results on big data set: ImageNet and Higgs display that BSND improves significantly accuracy of the SGD baseline while being faster by orders of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.