Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment

Desiani, Anita; Yahdin, Sugandi; Kartikasari, Annisa; Irmeilyana, Irmeilyana

doi:10.11591/ijai.v10.i2.pp346-354

Cited by 9 publications

(6 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the case of a large sample size, the number of students at risk will be significantly lower, and hence, in such situations of highly imbalanced data, the present model may be quite useful. The highest prediction accuracy achieved in the present study is 95.45%, which is greater than most of the previous studies [12][13][14][15][16][17][18]. Along with the enhanced prediction accuracy, the main advantage of the present work is that the methodology proposed in the present study is scalable from one context to the other.…”

Section: Resultsmentioning

confidence: 50%

“…Further, Ghavidel et al [16] solved the problem of imbalanced data by using a combination of the SVM-SMOTE (an over-sampling technique) and Edited-Nearest-Neighbor (an under-sampling technique) while predicting disease mortality. Recently, Desiani et al [17] applied k-Nearest Neighbor (k-NN), Artificial Neural Network (ANN), and C4.5 to students" educational background records along with SMOTE to make the dataset balanced, and that balanced dataset increased the accuracy of prediction, and for k-NN the maximum achieved accuracy was 83.71%.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk

Verma¹,

Yadav²,

Kholiya³

2022

IJACSA

View full text Add to dashboard Cite

Among the educational data mining problems, the early prediction of the students' academic performance is the most important task, so that timely and requisite support may be provided to the needy students. Machine learning techniques may be used as an important tool for predicting low-performers in educational institutions. In the present paper, five singlesupervised machine learning techniques have been used, including Decision Tree, Naïve Bayes, k-Nearest-Neighbor, Support Vector Machine, and Logistic Regression. To analyze the effect of an imbalanced dataset, the performance of these algorithms has been checked with and without various resampling methods such as Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE, SVM-SMOTE, and Adaptive Synthetic (ADASYN). The Random hold-out method and GridSearchCV were used as model validation techniques and hyper-parameter tuning respectively. The results of the present study indicated that Logistic Regression is the best performing classifier with every balanced dataset generated using all of the four resampling techniques and also achieved the highest accuracy of 94.54% with SMOTE. Furthermore, to improve the prediction results and to make the model scalable, the most suitable classifier was integrated with the help of bagging, and a well-accepted accuracy of 95.45% was achieved.

show abstract

Section: Resultsmentioning

confidence: 50%

Section: Related Workmentioning

confidence: 99%

A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk

Verma¹,

Yadav²,

Kholiya³

2022

IJACSA

View full text Add to dashboard Cite

show abstract

“…Total pixels of all image in segmentation prediction result would be included in a confusion matrix. Confusion matrix was applied to calculate accuracy, sensitivity, and specificity for the proposed method [31], [32]. The confusion matrix obtained by the blood vessel segmentation process was displayed in Table 3.…”

Section: Resultsmentioning

confidence: 99%

Contrast enhancement for improved blood vessels retinal segmentation using top-hat transformation and otsu thresholding

Arhami¹,

Desiani

Yahdin

et al. 2022

Int. J. Adv. Intell. Informatics

Self Cite

View full text Add to dashboard Cite

Diabetic Retinopathy is a effect of diabetes. It results abnormalities in the retinal blood vessels. The abnormalities can cause blurry vision and blindness. Automatic retinal blood vessels segmentation on retinal image can detect abnormalities in these blood vessels, actually resulting in faster and more accurate segmentation results. The paper proposed an automatic blood vessel segmentation method that combined Otsu Thresholding with image enhancement techniques. In image enhancement, it combined CLAHE with Top-hat transformation to improve image quality. The study used DRIVE dataset that provided retinal image data. The image data in dataset was generated by the fundus camera. The CLAHE and Top-hat transformation methods were applied to rise the contrast and reduce noise on the image. The images that had good quality could help the segmentation process to find blood vessels in retinal images appropriately by a computer. It improved the performance of the segmentation method for detecting blood vessels in retinal image. Otsu Thresholding was used to segment blood vessel pixels and other pixels as background by local threshold. To evaluation performance of the proposed method, the study has been measured accuracy, sensitivity, and specificity. The DRIVE dataset's study results showed that the averages of accuracy, sensitivity, and specificity values were 94.7%, 72.28%, and 96.87%, respectively. It indicated that the proposed method was successful and well to work on blood vessels segmentation retinal images especially for thick blood vessels.

show abstract

“…Desiani et al [48] are proposed a model based on a dataset of Universitas Sriwijaya with 2,934 records. The researchers tried to identify and solve the minority class labels (tightest 27%, and very tight 38.6%).…”

Section: Literature Reviewmentioning

confidence: 99%

A prediction model based machine learning algorithms with feature selection approaches over imbalanced dataset

Hamoud

Kamel²,

Gaafar³

et al. 2022

IJEECS

View full text Add to dashboard Cite

The educational sector faced many types of research in predicting student performance based on supervised and unsupervised machine learning algorithms. Most students' performance data are imbalanced, where the final classes are not equally represented. Besides the size of the dataset, this problem affects the model's prediction accuracy. In this paper, the Synthetic Minority Oversampling Technique (SMOTE) filter is applied to the dataset to find its effect on the model's accuracy. Four feature selection approaches are applied to find the most correlated attributes that affect the students' performance. The SMOTE filter is examined before and after applying feature selection approaches to measure the model's accuracy with supervised and unsupervised algorithms. Three supervised/unsupervised algorithms are examined based on feature selection approaches to predict the students' performance. The findings show that supervised algorithms (LMT, Simple Logistic, and Random Forest) got high accuracy after applying SMOTE without feature selection. The prediction accuracies of unsupervised algorithms (Canopy, EM, and Farthest First) are enhanced after applying feature selection approaches and SMOTE filter.

show abstract

Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment

Cited by 9 publications

References 26 publications

A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk

A Scalable Machine Learning-based Ensemble Approach to Enhance the Prediction Accuracy for Identifying Students at-Risk

Contrast enhancement for improved blood vessels retinal segmentation using top-hat transformation and otsu thresholding

A prediction model based machine learning algorithms with feature selection approaches over imbalanced dataset

Contact Info

Product

Resources

About