2013
DOI: 10.1186/1471-2105-14-64
|View full text |Cite
|
Sign up to set email alerts
|

Improved shrunken centroid classifiers for high-dimensional class-imbalanced data

Abstract: BackgroundPAM, a nearest shrunken centroid method (NSC), is a popular classification method for high-dimensional data. ALP and AHP are NSC algorithms that were proposed to improve upon PAM. The NSC methods base their classification rules on shrunken centroids; in practice the amount of shrinkage is estimated minimizing the overall cross-validated (CV) error rate.ResultsWe show that when data are class-imbalanced the three NSC classifiers are biased towards the majority class. The bias is larger when the number… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
23
0
2

Year Published

2016
2016
2021
2021

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 39 publications
(26 citation statements)
references
References 25 publications
1
23
0
2
Order By: Relevance
“…At the data level, sample rescaling and resampling strategies have been used to balance data by changing the distribution of samples in different classes, including oversampling (SMOTE) and undersampling (RUS) methods. [47] At the algorithmic level, ac ost-sensitive learning approach( class weight)h as also been attempted by setting an excessive cost function to misclassification of am inority class sample. [48] In addition, an ensemble classifier combined with the resampling method such as SMOTE + ENN, which is an ovel and promising route to reduce the influence of information loss or information overfit,w as used for comparison.…”
Section: Discussionmentioning
confidence: 99%
“…At the data level, sample rescaling and resampling strategies have been used to balance data by changing the distribution of samples in different classes, including oversampling (SMOTE) and undersampling (RUS) methods. [47] At the algorithmic level, ac ost-sensitive learning approach( class weight)h as also been attempted by setting an excessive cost function to misclassification of am inority class sample. [48] In addition, an ensemble classifier combined with the resampling method such as SMOTE + ENN, which is an ovel and promising route to reduce the influence of information loss or information overfit,w as used for comparison.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently the constructed model has low quality prediction because all objects are assigned to the dominant, negative 2 , class, regardless the value of the feature vector [19]. The bias of classification of the imbalanced data in favor of the majority class is even larger for high-dimensional data, where the number of variables greatly exceeds the number of samples [4,17,26,30,69]. And in the vast majority of medical datasets just such a situation occurs [43].…”
Section: Learning From Imbalanced Datamentioning
confidence: 96%
“…It happens that the disproportion of samples from each class is on the order of 100:1, 1 000:1 or even 10 000:1 [26]. The usage of the conventional learning methods for imbalanced data results in constructing a decision model to the majority class, which is predominant in the training set [4,26,39,41,52]. Consequently the constructed model has low quality prediction because all objects are assigned to the dominant, negative 2 , class, regardless the value of the feature vector [19].…”
Section: Learning From Imbalanced Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, class imbalance is an important consideration in classification of biomedical data, and there are techniques [4] which incorporate class distribution within the classification algorithm. Our approach is different in that we separate the classification from data preprocessing where we assume class imbalance is to be handled.…”
Section: Introductionmentioning
confidence: 99%