Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-68123-6_4
|View full text |Cite
|
Sign up to set email alerts
|

Boosting Support Vector Machines for Imbalanced Data Sets

Abstract: Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classifier. In this work, we choose to use a combination of b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
96
0
1

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 90 publications
(97 citation statements)
references
References 8 publications
0
96
0
1
Order By: Relevance
“…One can also use simpler measures to characterize classifiers, in particular if they have a purely deterministic prediction (see discussions on applicability of ROC analysis in [61]). Kubat and Matwin [35] proposed to use the geometric mean of sensitivity an specificity defined as a:…”
Section: Evaluation Measures For Learning Classifiers From Imbalancedmentioning
confidence: 99%
See 1 more Smart Citation
“…One can also use simpler measures to characterize classifiers, in particular if they have a purely deterministic prediction (see discussions on applicability of ROC analysis in [61]). Kubat and Matwin [35] proposed to use the geometric mean of sensitivity an specificity defined as a:…”
Section: Evaluation Measures For Learning Classifiers From Imbalancedmentioning
confidence: 99%
“…While the other group includes many quite specialized methods based on different principles. For instance, many authors changed search strategies, evaluation criteria or parameters in the internal optimization of the algorithm, see e.g [23,27,31,61,62,63]. A survey of special changes in ensembles is given in [21], while adaptations to cost sensitive learning are reviewed in [18].…”
Section: Introductionmentioning
confidence: 99%
“…We must associate each example in the training dataset not only with a class label but also likelihood values which denotes the degree of membership towards the positive and negative classes. We then facilitate the few labeled negative examples and the generated likelihood values into the learning phase of SVDD to build a more accurate classifier [9].…”
Section: Introductionmentioning
confidence: 99%
“…This group of methods perform inference by assigning weights to each of the examples in the training data as well as adjusting training procedure by introducing different misclassification costs. In this group of techniques, we can identify the algorithms for constructing cost-sensitive models such as decision trees (Drummond and Holte 2000), neural networks (Kukar and Kononenko 1998), SVMs (Morik et al 1999) and ensemble classifiers (Fan et al 1999;Wang and Japkowicz 2010;Zięba et al 2014).…”
mentioning
confidence: 99%
“…Modern solutions utilize boosted SVM classifiers as highquality, cost-sensitive predictors (Wang and Japkowicz 2010;Zięba et al 2014). Despite the high accuracy of prediction of such models confirmed by numerous experiments, the problem of setting proper values of misclassification costs arises during training.…”
mentioning
confidence: 99%