Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification

Feng, Fang; Li, Kuan‐Ching; Shen, Jun; Zhou, Qing; Yang, Xuhui

doi:10.1109/access.2020.2987364

Cited by 55 publications

(25 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the problems of classifying imbalanced data often occur in real-life applications such as analyzing medical datasets, where the cases of patients with the disease are significantly lower than those without the disease. For instance, in cancer detection, the cases of patients diagnosed with cancer are much smaller than those of patients who do not have cancer [ 4 ]. The classification model to predict cancer results in lower classification performance of abnormal class and incorrect prediction disease which leads to serious health risk.…”

Section: Introductionmentioning

confidence: 99%

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Al-Shamaa

Kurnaz

Duru

et al. 2020

Applied Bionics and Biomechanics

View full text Add to dashboard Cite

Imbalanced class distribution in the medical dataset is a challenging task that hinders classifying disease correctly. It emerges when the number of healthy class instances being much larger than the disease class instances. To solve this problem, we proposed undersampling the healthy class instances to improve disease class classification. This model is named Hellinger Distance Undersampling (HDUS). It employs the Hellinger Distance to measure the resemblance between majority class instance and its neighbouring minority class instances to separate classes effectively and boost the discrimination power for each class. An extensive experiment has been conducted on four imbalanced medical datasets using three classifiers to compare HDUS with a baseline model and three state-of-the-art undersampling models. The outcomes display that HDUS can perform better than other models in terms of sensitivity, F1 measure, and balanced accuracy.

show abstract

Section: Introductionmentioning

confidence: 99%

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Al-Shamaa

Kurnaz

Duru

et al. 2020

Applied Bionics and Biomechanics

View full text Add to dashboard Cite

show abstract

“…The problem of imbalanced classification (misclassification) has been one of the factors in the emergence of a relatively new research topic in the field of machine learning under the name of "cost-sensitive learning" [30]. It consists of associating a "cost" penalty to an incorrect prediction and then trying to minimize the cost of a model on the learning dataset.…”

Section: G Results and Discussionmentioning

confidence: 99%

Border Trespasser Classification Using Artificial Intelligence

et al. 2021

View full text Add to dashboard Cite

Monitoring the border is a very important task for national security. Wireless sensor networks (WSN) appear well suited in this application. This work aims to monitor a large-scale geographical framework that represents the borders of countries. Researchers take the Tunisian Algerian border as an example. This border is labeled by the illegal passage of intruders between the two countries. The task is to identify the intruders and study their kinematics based on speed, acceleration, and bearing. The appropriate types of sensors are determined according to the nature of intruders. Six classification techniques are compared which are: Naïve Bayes, Support Vector Machine (SVM), Multilayer Perceptron, Best First Decision Tree (BF-Tree), Logistic Alternating Decision Tree (LAD-Tree), and J48. The comparison of the performance of the classification techniques is provided in terms of correct differentiation rates, confusion matrices, and the time taken to build each model. Four different levels of cross-validation are used to validate the classifiers. The results indicate that J48 has achieved the highest correct classification rate with a relatively low model-building time.

show abstract

“…To study the effectiveness of CGAN in addressing the issue of an imbalanced dataset, a comparison is made with two typical approaches: the synthetic minority oversampling technique (SMOTE) [42,43] and cost-sensitive learning (CSL) [44,45]. Table 8 presents the performance of CGAN-IFCM, SMOTE-IFCM, and CSL-IFCM in binary and multi-class voice disorder detection.…”

Section: Comparison Between Cgan Smote and Cslmentioning

confidence: 99%

Combined Generative Adversarial Network and Fuzzy C-Means Clustering for Multi-Class Voice Disorder Detection with an Imbalanced Dataset

2020

View full text Add to dashboard Cite

The world has witnessed the success of artificial intelligence deployment for smart healthcare applications. Various studies have suggested that the prevalence of voice disorders in the general population is greater than 10%. An automatic diagnosis for voice disorders via machine learning algorithms is desired to reduce the cost and time needed for examination by doctors and speech-language pathologists. In this paper, a conditional generative adversarial network (CGAN) and improved fuzzy c-means clustering (IFCM) algorithm called CGAN-IFCM is proposed for the multi-class voice disorder detection of three common types of voice disorders. Existing benchmark datasets for voice disorders, the Saarbruecken Voice Database (SVD) and the Voice ICar fEDerico II Database (VOICED), use imbalanced classes. A generative adversarial network offers synthetic data to reduce bias in the detection model. Improved fuzzy c-means clustering considers the relationship between adjacent data points in the fuzzy membership function. To explain the necessity of CGAN and IFCM, a comparison is made between the algorithm with CGAN and that without CGAN. Moreover, the performance is compared between IFCM and traditional fuzzy c-means clustering. Lastly, the proposed CGAN-IFCM outperforms existing models in its true negative rate and true positive rate by 9.9–12.9% and 9.1–44.8%, respectively.

show abstract

Using Cost-Sensitive Learning and Feature Selection Algorithms to Improve the Performance of Imbalanced Classification

Cited by 55 publications

References 47 publications

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

The Use of Hellinger Distance Undersampling Model to Improve the Classification of Disease Class in Imbalanced Medical Datasets

Border Trespasser Classification Using Artificial Intelligence

Combined Generative Adversarial Network and Fuzzy C-Means Clustering for Multi-Class Voice Disorder Detection with an Imbalanced Dataset

Contact Info

Product

Resources

About