An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data

Xie, Wenhao; Liang, Gongqian; Dong, Zhonghui; Tan, Baoyu; Zhang, Baosheng

doi:10.1155/2019/3526539

Cited by 36 publications

(21 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Similarly, the difference in the performance evaluation of seven classes ranges from 0% to 83%, as shown in ref. [16]. Some results are evidence of the high diversity between the F measure of the majority and minority classes.…”

Section: Related Workmentioning

confidence: 94%

Conversion of adverse data corpus to shrewd output using sampling metrics

Ashraf

Saleem

Ahmed

et al. 2020

Vis. Comput. Ind. Biomed. Art

View full text Add to dashboard Cite

An imbalanced dataset is commonly found in at least one class, which are typically exceeded by the other ones. A machine learning algorithm (classifier) trained with an imbalanced dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalanced dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance classifiers' performance. In this study, we consider an imbalanced dataset from an educational context. Initially, we examine all shortcomings regarding the classification of an imbalanced dataset. Then, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifiers is measured using the underlying information in their confusion matrices, such as accuracy, precision, recall, and F measure. The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class. The analysis confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.

show abstract

Section: Related Workmentioning

confidence: 94%

Conversion of adverse data corpus to shrewd output using sampling metrics

Ashraf

Saleem

Ahmed

et al. 2020

Vis. Comput. Ind. Biomed. Art

View full text Add to dashboard Cite

show abstract

“…This procedure, does not pay attention to neighbour examples, which results in an increase of the occurrence of overlapping between classes [28]. To avoid this effect, various adaptive sampling methods [32] have been put forward. Some representative work include Borderline-SMOTE [33] and Adaptive Synthetic sampling (ADA-SYN) [34].…”

Section: Related Workmentioning

confidence: 99%

Perceptual Borderline for Balancing Multi-Class Spontaneous Emotional Data

Letaifa

Torres

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In order to solve the problems existing in the traditional between-classes separability measure, inspired by three decision-making of clustering thought [25], this paper proposes between-classes separability measure about q neighbors, and the new between-classes separability measure mainly considers the following three factors as follows. (1) e Between-Classes Variance. Starting from the number and distance between the samples, it reflects the closeness of the relationship between a certain class of objects and its neighboring classes.…”

Section: 1mentioning

confidence: 99%

Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree

Xie

She

Qiao

2021

Scientific Programming

Self Cite

View full text Add to dashboard Cite

Support vector machines (SVMs) are designed to solve the binary classification problems at the beginning, but in the real world, there are a lot of multiclassification cases. The multiclassification methods based on SVM are mainly divided into the direct methods and the indirect methods, in which the indirect methods, which consist of multiple binary classifiers integrated in accordance with certain rules to form the multiclassification model, are the most commonly used multiclassification methods at present. In this paper, an improved multiclassification algorithm based on the balanced binary decision tree is proposed, which is called the IBDT-SVM algorithm. In this algorithm, it considers not only the influence of “between-classes distance” and “class variance” in traditional measures of between-classes separability but also takes “between-classes variance” into consideration and proposes a new improved “between-classes separability measure.” Based on the new “between-classes separability measure,” it finds out the two classes with the largest between-classes separability measure and uses them as the positive and negative samples to train and learn the classifier. After that, according to the principle of the class-grouping-by-majority, the remaining classes are close to these two classes and merged into the positive samples and the negative samples to train SVM classifier again. For the samples with uneven distribution or sparse distribution, this method can avoid the error caused by the shortest canter distance classification method and overcome the “error accumulation” problem existing in traditional binary decision tree to the greatest extent so as to obtain a better classifier. According to the above algorithm, each layer node of the decision tree is traversed until the output classification result is a single-class label. The experimental results show that the IBDT-SVM algorithm proposed in this paper can achieve better classification accuracy and effectiveness for multiple classification problems.

show abstract

An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data

Cited by 36 publications

References 17 publications

Conversion of adverse data corpus to shrewd output using sampling metrics

Conversion of adverse data corpus to shrewd output using sampling metrics

Perceptual Borderline for Balancing Multi-Class Spontaneous Emotional Data

Research on Multiple Classification Based on Improved SVM Algorithm for Balanced Binary Decision Tree

Contact Info

Product

Resources

About