LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Non-negative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although bound-constrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties.We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided.
Support vector machines (SVMs) with the Gaussian (RBF) kernel have been popular for practical use. Model selection in this class of SVMs involves two hyperparameters: the penalty parameter C and the kernel width σ. This paper analyzes the behavior of the SVM classifier when these hyperparameters take very small or very large values. Our results help in a good understanding of the hyperparameter space that leads to an efficient heuristic method of searching for hyperparameter values with small generalization errors. The analysis also indicates that if complete model selection using the Gaussian kernel has been conducted, there is no need to consider linear SVM.
In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such large-scale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1-and L2-loss functions. The proposed method is simple and reaches an -accurate solution in O(log(1/ )) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVM perf , and a recent primal coordinate descent implementation.
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.Keywords: subcellular localization; support vector machine; Gram-negative bacteria; machine-learning method; proteome; genome; n-peptide compositions The subcellular location of a protein is closely correlated to its biological function (Jensen et al. 2002). With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict protein subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. There are methods (Nakai and Kanehisa 1992;Nielsen et al. 1997;Emanuelsson et al. 1999Emanuelsson et al. , 2000Nakai 2000) based on the observation that sequences targeted to specific locations rely on the N-terminal sorting or signal sequences. For example, TargetP (Emanuelsson et al. 2000), a useful tool for analysis of signal peptides, predicts protein subcellular localization for eukaryotic sequences. On the other hand, a number of studies (Cedano et al. 1997;Andrade et al. 1998;Reinhardt and Hubbard 1998;Yuan 1999;Chou 2001;Hua and Sun 2001;Chou and Cai 2002) have shown that amino acid compositions are useful in discriminating protein subcellular localization sites. Cedano et al. (1997) developed a predictive system ProtLock based on a correlation analysis of the amino acid compositions and the cellular locations for five protein classes. Reinhardt and Hubbard (1998) developed a neural network approach, NNPSL, based on amino acid compositions for both eukaryotic and prokaryotic sequences. For the same data sets, Hua and Sun (2001) also developed SubLoc based on support vector machine (SVM) techniques. Chou (2001) developed approaches based on the pseudo amino acid compositions that include sequenceorder information.Gram-negative bacteria have five major subcellular localization sites that include the cytoplasm, the inner membrane, the outer membrane, the periplasm, and the extracelReprint request to:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.