Boosting Support Vector Machines for Imbalanced Data Sets

Wang, Benjamin; Japkowicz, Nathalie

doi:10.1007/978-3-540-68123-6_4

Cited by 90 publications

(97 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…One can also use simpler measures to characterize classifiers, in particular if they have a purely deterministic prediction (see discussions on applicability of ROC analysis in [61]). Kubat and Matwin [35] proposed to use the geometric mean of sensitivity an specificity defined as a:…”

Section: Evaluation Measures For Learning Classifiers From Imbalancedmentioning

confidence: 99%

“…While the other group includes many quite specialized methods based on different principles. For instance, many authors changed search strategies, evaluation criteria or parameters in the internal optimization of the algorithm, see e.g [23,27,31,61,62,63]. A survey of special changes in ensembles is given in [21], while adaptations to cost sensitive learning are reviewed in [18].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Stefanowski

2013

Emerging Paradigms in Machine Learning

View full text Add to dashboard Cite

This paper deals with inducing classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). The minority class is usually of primary interest and it is required to recognize its members as accurately as possible. Class imbalance constitutes a difficulty for most algorithms learning classifiers as they are biased toward the majority classes. The first part of this study is devoted to discussing main properties of data that cause this difficulty. Following the review of earlier, related research several types of artificial, imbalanced data sets affected by critical factors have been generated. The decision trees and rule based classifiers have been generated from these data sets. Results of first experiments show that too small number of examples from the minority class is not the main source of difficulties. These results confirm the initial hypothesis saying the degradation of classification performance is more related to the minority class decomposition into small sub-parts. Another critical factor concerns presence of a relatively large number of borderline examples from the minority class in the overlapping region between classes, in particular for non-linear decision boundaries. The novel observation is showing the impact of rare examples from the minority class located inside the majority class. The experiments make visible that stepwise increasing the number of borderline and rare examples in the minority class has larger influence on the considered classifiers than increasing the decomposition of this class. The second part of this paper is devoted to studying an improvement of classifiers by pre-processing of such data with resampling methods. Next experiments examine the influence of the identified critical data factors on performance of 4 different pre-processing re-sampling methods: two versions of random over-sampling, focused under-sampling NCR and the hybrid method SPIDER. Results show that if data is sufficiently disturbed by borderline and rare examples SPIDER and partly NCR work better than over-sampling.

show abstract

Section: Evaluation Measures For Learning Classifiers From Imbalancedmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Stefanowski

2013

Emerging Paradigms in Machine Learning

View full text Add to dashboard Cite

show abstract

“…We must associate each example in the training dataset not only with a class label but also likelihood values which denotes the degree of membership towards the positive and negative classes. We then facilitate the few labeled negative examples and the generated likelihood values into the learning phase of SVDD to build a more accurate classifier [9].…”

Section: Introductionmentioning

confidence: 99%

An Effective Progress for Extreme Deviation Spying with Flawed Data Labels

N.A¹

2015

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Abstract:The flaw detection will detect data objects that are inconsistent with normal dataset. In additional to normal data, there exist negative outliers in many applications and data will be imperfectly labeled. This paper represents an outlier detection approach to address data with imperfect data labels into learning. Our past approach works in two steps. In the first step, we develop a pseudo training dataset by computing possible values of each example based on its local action. We introduce kernel k-means clustering method and kernel Local Outlier Factor-based method to compute the likelihood values. In the next step, we introduce the obtained possible values and limited abnormal examples into SVDD-based learning to produce a more accurate classification for global outlier detection. The proposed system has three approaches. They are Naive Bayes approach, Logistic regression and . For classification of dataset we go for these two approaches which makes easy to find outliers.

show abstract

“…This group of methods perform inference by assigning weights to each of the examples in the training data as well as adjusting training procedure by introducing different misclassification costs. In this group of techniques, we can identify the algorithms for constructing cost-sensitive models such as decision trees (Drummond and Holte 2000), neural networks (Kukar and Kononenko 1998), SVMs (Morik et al 1999) and ensemble classifiers (Fan et al 1999;Wang and Japkowicz 2010;Zięba et al 2014).…”

mentioning

confidence: 99%

“…Modern solutions utilize boosted SVM classifiers as highquality, cost-sensitive predictors (Wang and Japkowicz 2010;Zięba et al 2014). Despite the high accuracy of prediction of such models confirmed by numerous experiments, the problem of setting proper values of misclassification costs arises during training.…”

mentioning

confidence: 99%

Boosted SVM with active learning strategy for imbalanced data

Ziăba

Tomczak

2014

Soft Comput

View full text Add to dashboard Cite

In this work, we introduce a novel training method for constructing boosted Support Vector Machines (SVMs) directly from imbalanced data. The proposed solution incorporates the mechanisms of active learning strategy to eliminate redundant instances and more properly estimate misclassification costs for each of the base SVMs in the committee. To evaluate our approach, we make comprehensive experimental studies on the set of 44 benchmark datasets with various types of imbalance ratio. In addition, we present application of our method to the real-life decision problem related to the short-term loans repayment prediction.

show abstract

Boosting Support Vector Machines for Imbalanced Data Sets

Cited by 90 publications

References 8 publications

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data

An Effective Progress for Extreme Deviation Spying with Flawed Data Labels

Boosted SVM with active learning strategy for imbalanced data

Contact Info

Product

Resources

About