In dealing with large data sets, the reduced support vector machine (RSVM) was proposed for the practical objective to overcome some computational difficulties as well as to reduce the model complexity. In this paper, we study the RSVM from the viewpoint of sampling design, its robustness, and the spectral analysis of the reduced kernel. We consider the nonlinear separating surface as a mixture of kernels. Instead of a full model, the RSVM uses a reduced mixture with kernels sampled from certain candidate set. Our main results center on two major themes. One is the robustness of the random subset mixture model. The other is the spectral analysis of the reduced kernel. The robustness is judged by a few criteria as follows: 1) model variation measure; 2) model bias (deviation) between the reduced model and the full model; and 3) test power in distinguishing the reduced model from the full one. For the spectral analysis, we compare the eigenstructures of the full kernel matrix and the approximation kernel matrix. The approximation kernels are generated by uniform random subsets. The small discrepancies between them indicate that the approximation kernels can retain most of the relevant information for learning tasks in the full kernel. We focus on some statistical theory of the reduced set method mainly in the context of the RSVM. The use of a uniform random subset is not limited to the RSVM. This approach can act as a supplemental algorithm on top of a basic optimization algorithm, wherein the actual optimization takes place on the subset-approximated data. The statistical properties discussed in this paper are still valid.
Principal Component Analysis (PCA) is a commonly used tool for dimension reduction in analyzing high dimensional data; Multilinear Principal Component Analysis (MPCA) has the potential to serve the similar function for analyzing tensor structure data. MPCA and other tensor decomposition methods have been proved effective to reduce the dimensions for both real data analyses and simulation studies (Ye, 2005;Lu, Plataniotis and Venetsanopoulos, 2008;Kolda and Bader, 2009;Li, Kim and Altman, 2010). In this paper, we investigate MPCA's statistical properties and provide explanations for its advantages. Conventional PCA, vectorizing the tensor data, may lead to inefficient and unstable prediction due to its extremely large dimensionality. On the other hand, MPCA, trying to preserve the data structure, searches for low-dimensional multilinear projections and decreases the dimensionality efficiently. The asymptotic theories for order-two MPCA, including asymptotic distributions for principal components, associated projections and the explained variance, are developed. Finally, MPCA is shown to improve conventional PCA on analyzing the Olivetti Faces data set, by constructing more module oriented basis in reconstructing the test faces.
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation.One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M -estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, i.e., it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression.
This letter discusses the robustness issue of kernel principal component analysis. A class of new robust procedures is proposed based on eigenvalue decomposition of weighted covariance. The proposed procedures will place less weight on deviant patterns and thus be more resistant to data contamination and model deviation. Theoretical influence functions are derived, and numerical examples are presented as well. Both theoretical and numerical results indicate that the proposed robust method outperforms the conventional approach in the sense of being less sensitive to outliers. Our robust method and results also apply to functional principal component analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.