The usual dimensionality reduction technique in supervised learning is mainly based on linear discriminant analysis (LDA), but it suffers from singularity or undersampled problems. On the other hand, a regular support vector machine (SVM) separates the data only in terms of one single direction of maximum margin, and the classification accuracy may be not good enough. In this letter, a recursive SVM (RSVM) is presented, in which several orthogonal directions that best separate the data with the maximum margin are obtained. Theoretical analysis shows that a completely orthogonal basis can be derived in feature subspace spanned by the training samples and the margin is decreasing along the recursive components in linearly separable cases. As a result, a new dimensionality reduction technique based on multilevel maximum margin components and then a classifier with high accuracy are achieved. Experiments in synthetic and several real data sets show that RSVM using multilevel maximum margin features can do efficient dimensionality reduction and outperform regular SVM in binary classification problems.
Stochastic Coordinate Descent (SCD) methods are among the first optimization schemes suggested for efficiently solving large scale problems. However, until now, there exists a gap between the convergence rate analysis and practical SCD algorithms for general smooth losses and there is no primal SCD algorithm for nonsmooth losses. In this paper, we discuss these issues using the recently developed structural optimization techniques. In particular, we first present a principled and practical SCD algorithm for regularized smooth losses, in which the one-variable subproblem is solved using the proximal gradient method and the adaptive componentwise Lipschitz constant is obtained employing the line search strategy. When the loss is nonsmooth, we present a novel SCD algorithm, in which the one-variable subproblem is solved using the dual averaging method. We show that our algorithms exploit the regularization structure and achieve several optimal convergence rates that are standard in the literature. The experiments demonstrate the expected efficiency of our SCD algorithms in both smooth and nonsmooth cases.
The truncated regular -loss support vector machine can eliminate the excessive number of support vectors (SVs); thus, it has significant advantages in robustness and scalability. However, in this paper, we discover that the associated state-of-the-art solvers, such as difference convex algorithm and concave-convex procedure, not only have limited sparsity promoting property for general truncated losses especially the -loss but also have poor scalability for large-scale problems. To circumvent these drawbacks, we present a general multistage scheme with explicit interpretation regarding SVs as well as outliers. In particular, we solve the general nonconvex truncated loss minimization through a sequence of associated convex subproblems, in which the outliers are removed in advance. The proposed algorithm can be regarded as a structural optimization attempt carefully considering sparsity imposed by the nonconvex truncated losses. We show that this general multistage algorithm offers sufficient sparsity especially for the truncated -loss. To further improve the scalability, we propose a linear multistep algorithm by employing a single iteration of coordinate descent to monotonically decrease the objective function at each stage and a kernel algorithm by using the Karush-Kuhn-Tucker conditions to cheaply find most part of the outliers for the next stage. Comparison experiments demonstrate that our methods have superiority in sparsity as well as efficiency in scalability.
A wide variety of learning problems can be posed in the framework of convex optimization. Many efficient algorithms have been developed based on solving the induced optimization problems. However, there exists a gap between the theoretically unbeatable convergence rate and the practically efficient learning speed. In this paper, we use the variational inequality (VI) convergence to describe the learning speed. To this end, we avoid the hard concept of regret in online learning and directly discuss the stochastic learning algorithms. We first cast the regularized learning problem as a VI. Then, we present a stochastic version of alternating direction method of multipliers (ADMMs) to solve the induced VI. We define a new VI-criterion to measure the convergence of stochastic algorithms. While the rate of convergence for any iterative algorithms to solve nonsmooth convex optimization problems cannot be better than O(1/√t), the proposed stochastic ADMM (SADMM) is proved to have an O(1/t) VI-convergence rate for the l1-regularized hinge loss problems without strong convexity and smoothness. The derived VI-convergence results also support the viewpoint that the standard online analysis is too loose to analyze the stochastic setting properly. The experiments demonstrate that SADMM has almost the same performance as the state-of-the-art stochastic learning algorithms but its O(1/t) VI-convergence rate is capable of tightly characterizing the real learning speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.