Training Data Selection for Support Vector Machines

Wang, Jigang; Neskovic, Predrag; Cooper, Leon N.

doi:10.1007/11539087_71

Cited by 68 publications

(47 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We present the result for RBF kernel shown in Table III, and for RSSVM on both data sets, sample is still proportional to 0.5%, 1% and 5% respectively. It shows a similar result as our second experiment, except that RSSVM got a lower precision on NWI data set, which means that RSSVM is not a reliable algorithm like shown in [3] and the performance of RSSVM depends on the distribution of the data set. Similar problem also exists for CBSVM, since it doesn't have a tendency to keep important data near the boundary uncompressed during building the CF trees.…”

Section: Experiments and Resultssupporting

confidence: 74%

“…It works by sampling a small proportion of data to approximately reflect the distribution of the entire data set. Experiments have shown that this scheme efficiently works well but sometimes has poor performance because it may lose important information while sampling [3], [4]. Active learning [9] is developed to reduce the costly labeling work by selecting "important" data instances in the data set, and only requiring user to consider these important data instances for further labeling.…”

Section: Related Workmentioning

confidence: 99%

“…However, clustering itself is a costly operation and sampling may suffer from important information loss [3], [4], therefore there is a good reason for exploring new and better SVM training schemes in order to reduce the total training time on large data sets. In this paper we present a novel SVM training scheme, Random Partition based SVM (RPSVM), which partitions the data set randomly rather than through clustering.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Accelerating the Training Process of Support Vector Machines by Random Partition

Xu¹,

Li²,

Li³

et al. 2014

IJCTE

View full text Add to dashboard Cite

Abstract-In this paper we present a novel method, Random Partition based SVM (RPSVM), for speeding up SVM training. Instead of clustering the training data prior to training, RPSVM randomly partitions the training data into several clusters and then uses the centers of the clusters to train an initial SVM. This trained SVM is used to find critical clusters which are located on the decision boundary. The same procedure is applied repeatedly to each of the critical clusters, resulting in a refined SVM which consists of the supporting vectors in the initial round of training and those in the repeated round. This procedure is repeated recursively until no critical cluster exists, resulting in the final SVM. Our experiments on synthetic and real data sets have shown that RPSVM is indeed scalable to large data sets while the high performance is retained.

show abstract

Section: Experiments and Resultssupporting

confidence: 74%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating the Training Process of Support Vector Machines by Random Partition

Xu¹,

Li²,

Li³

et al. 2014

IJCTE

View full text Add to dashboard Cite

show abstract

“…A simple yet effective neighborhood analysis of each T vector was proposed by Wang et al (2005). For each training vector, the largest sphere which contains only vectors of the same class is determined, and the number of vectors encompassed by this sphere is verified (N a for each a).…”

Section: Neighborhood Analysis Methodsmentioning

confidence: 99%

Selecting training sets for support vector machines: a review

2018

View full text Add to dashboard Cite

Support vector machines (SVMs) are a supervised classifier successfully applied in a plethora of real-life applications. However, they suffer from the important shortcomings of their high time and memory training complexities, which depend on the training set size. This issue is especially challenging nowadays, since the amount of data generated every second becomes tremendously large in many domains. This review provides an extensive survey on existing methods for selecting SVM training data from large datasets. We divide the state-of-the-art techniques into several categories. They help understand the underlying ideas behind these algorithms, which may be useful in designing new methods to deal with this important problem. The review is complemented with the discussion on the future research pathways which can make SVMs easier to exploit in practice.

show abstract

“…to select the most important samples) plays an important role. Plenty of work for sample selection has been done, including for example based on clustering methods [4,5], Mahalanobis distance [6], !-skeleton and Hausdorffdistance [7,8], and the information theory [9,10]. Although much research progress has been achieved, problems still remain.…”

Section: Introductionmentioning

confidence: 99%

A New Approach for Remote Sensing Image Sample Selection Based on Convex Theory

Pan

Sun

2015

Int. J. Onl. Eng.

View full text Add to dashboard Cite

Abstract-Advancements in remote sensing technology have led to improvements in the acquisition of land cover information. The extraction of accurate and timely knowledge about land cover from remote sensing imagery largely depends on the classification techniques used. Support vector machine has been receiving considerable attention as a promising method for classifying remote sensing imagery. However, the support vector machine learning process typically requires a large memory and significant computation time for treating a large sample set, in which some of the samples might be redundant and useless for the support vector machine model training. Therefore, higher-quality and fewer samples from the sample selection should be utilized for support vector machine-based remote sensing classification. A convex theory-based remote sensing sample selection algorithm for support vector machine classifiers is developed in this work. A Landsat-5 Thematic Mapper imagery acquired on August 31, 2009 (orbit number 113/27) is adopted in our experiments. The study area's land cover/use was divided into five categories. Using the region of interest tool, we select samples from the image of the study area, with each category consisting of 1000 independent pixels. Results show that for most cases, our method can achieve higher classification accuracy than random sample selection method.

show abstract

Training Data Selection for Support Vector Machines

Cited by 68 publications

References 8 publications

Accelerating the Training Process of Support Vector Machines by Random Partition

Accelerating the Training Process of Support Vector Machines by Random Partition

Selecting training sets for support vector machines: a review

A New Approach for Remote Sensing Image Sample Selection Based on Convex Theory

Contact Info

Product

Resources

About