The transition matrix, denoting the transition relationship from clean labels to noisy labels, is essential to build statistically consistent classifiers in label-noise learning. Existing methods for estimating the transition matrix rely heavily on estimating the noisy class posterior. However, the estimation error for noisy class posterior could be large due to the randomness of label noise. The estimation error would lead the transition matrix to be poorly estimated. Therefore, in this paper, we aim to solve this problem by exploiting the divide-and-conquer paradigm. Specifically, we introduce an intermediate class to avoid directly estimating the noisy class posterior. By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimate transition matrices. We term the proposed method the dual T -estimator. Both theoretical analyses and empirical results illustrate the effectiveness of the dual T -estimator for estimating transition matrices, leading to better classification performances.
As one of the most popular and effective classification algorithms, Support Vector Machine (SVM) has attracted much attention in recent years. Classifiers ensemble is a research direction in machine learning and statistics, it often gives a higher classification accuracy than the single classifier. This paper proposes a new ensemble algorithm based on SVM. The proposed classification algorithm PB-SVM Ensemble consists of some SVM classifiers produced by PCAenSVM and fifty classifiers trained using Bagging, the results are combined to make the final decision on testing set using majority voting. The performance of PB-SVM Ensemble are evaluated on six datasets which are from UCI repository, Statlog or the famous research. The results of the experiment are compared with LibSVM, PCAenSVM and Bagging. PB-SVM Ensemble outperform other three algorithms in classification accuracy, and at the same time keep a higher confidence of accuracy than Bagging.
SVM (Support Vector Machine) is a powerful data mining algorithm, and is mainly used to finish classification or regression tasks. In this literature, SVM is used to conduct disease prediction. We focus on integrating with stratified sample and grid search technology to improve the classification accuracy of SVM, thus, we propose an improved algorithm named SGSVM: Stratified sample and Grid search based SVM. To testify the performance of SGSVM, heart-disease data from UCI are used in our experiment, and the results show SGSVM has obvious improvement in classification accuracy, and this is very valuable especially in disease prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.