Resampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset

Khaldy, Mohammad Al; Kambhampati, Chandrasekhar

doi:10.15406/iratj.2018.04.00090

Cited by 24 publications

(26 citation statements)

References 29 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The resample has improved classification performance significantly, even on highdimensional data without using the feature selection method in line with previous studies. This previous research has shown that the feature selection methods' role, except for the information gain, is still lesser than the resample method to improve the classification performance, including accuracy, sensitivity, and precision [20].…”

Section: Resultsmentioning

confidence: 99%

“…SMOTE can obscure the information of data interrelations in every class, thereby reducing the k-NN performance. As in previous studies, the information of interrelation data can be obscured due to the addition of unique samples that lead to generalization errors in the classifier [10], [18], [20]. Because of this flaw, SMOTE needs to be developed as in previous studies to improve the interclass boundary [10], [14], [23].…”

Section: Resultsmentioning

confidence: 99%

“…High-dimensional data is also a challenge in data mining because it can increase calculation complexity in data interpretation and potentially reduces classification performance [20]. Besides that, the data in this research is possibly unbalanced.…”

Section: A Workflow and Datasetmentioning

confidence: 99%

“…Oversampling replicates data from minority classes to match the majority and adds information between instances in minority classes. The undersampling process reduces the majority class's frequency by replacing or deleting some data in the sample, so the data's composition is balanced [20]. However, undersampling has a risk of losing data in the majority class if the data apparently may improve the classification process [23], [24].…”

Section: Resamplementioning

confidence: 99%

See 3 more Smart Citations

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

Akbar

Husaini

Akbar

et al. 2020

Jurnal Teknologi dan Sistem Komputer

View full text Add to dashboard Cite

Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: A Workflow and Datasetmentioning

confidence: 99%

Section: Resamplementioning

confidence: 99%

See 2 more Smart Citations

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

Akbar

Husaini

Akbar

et al. 2020

Jurnal Teknologi dan Sistem Komputer

View full text Add to dashboard Cite

show abstract

“…Kelas yang memiliki banyak instance disebut kelas mayoritas dan yang memiliki jumlah instance yang lebih sedikit disebut kelas minoritas [10]. Dalam situasi kehidupan nyata kadang-kadang kelas minoritas lebih menarik daripada kelas mayoritas, misal dalam data medis pada kasus gagal jantung [11] bidang ekonomi seperti credit scoring [12], credit cards fraud, dimana penyalahgunaan kartu kredit lebih sedikit daripada yang tidak disalah gunakan. serta bidang-bidang lain seperti deteksi spam pada email dimana email berupa spam lebih sedikit daripada bukan spam.…”

Section: Machine Learning (Ml) Merupakan Sub Bidang Dariunclassified

Oversampling Method on Classifying Hypertension Using Naive Bayes, Decision Tree, and Artificial Neural Network

Chamidah¹,

Santoni²,

Matondang³

2020

RESTI

View full text Add to dashboard Cite

Oversampling is a technique to balance the number of data records for each class by generating data with a small number of records in a class, so that the amount is balanced with data with a class with a large number of records. Oversampling in this study is applied to hypertension dataset where hypertensive class has a small number of records when compared to the number of records for non-hypertensive classes. This study aims to evaluate the effect of oversampling on the classification of hypertension dataset consisting of hypertensive and non-hypertensive classes by utilizing the Naïve Bayes, Decision Tree, and Artificial Neural Network (ANN) as well as finding the best model of the three algorithms. Evaluation of the use of oversampling on hypertension dataset is done by processing the data by imputing missing values, oversampling, and transforming data into the same range, then using the Naïve Bayes, Decision Tree, and ANN to build classification models. By dividing 80% of data as training data to build models and 20% as validation data for testing models, we had an increase in classification performance in the form of accuracy, precision, and recall of the oversampled data when compared without oversampling. The best performance in this study resulted in the highest accuracy using ANN with 0.91, precision 0.86 and recall 0.99.

show abstract

Wrapper Based Approach for Network Intrusion Detection Model with Combination of Dual Filtering Technique of Resample and SMOTE

Awujoola

Ogwueleka

Irhebhude

et al. 2021

Artificial Intelligence for Cyber Security: Methods, Issues and Possible Horizons or Opportunities

View full text Add to dashboard Cite

Resampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset

Cited by 24 publications

References 29 publications

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator

Oversampling Method on Classifying Hypertension Using Naive Bayes, Decision Tree, and Artificial Neural Network

Wrapper Based Approach for Network Intrusion Detection Model with Combination of Dual Filtering Technique of Resample and SMOTE

Contact Info

Product

Resources

About