2021
DOI: 10.11591/ijai.v10.i2.pp346-354
|View full text |Cite
|
Sign up to set email alerts
|

Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment

Abstract: <span id="docs-internal-guid-cd6caed5-7fff-5f99-0341-cb32fa5ad787"><span>The imbalanced data affect the accuracy of models, especially for precision and sensitivity, it makes difficult to find information on minority class. The problem is identified in the tracer study dataset Universitas Sriwijaya that has 2934 data. The label attribute is divided into several label classes, namely not tight, somewhat-tight, tight, very tight, and tightest. The number of the tightest and very tight is 27% and 38.6… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…In the case of a large sample size, the number of students at risk will be significantly lower, and hence, in such situations of highly imbalanced data, the present model may be quite useful. The highest prediction accuracy achieved in the present study is 95.45%, which is greater than most of the previous studies [12][13][14][15][16][17][18]. Along with the enhanced prediction accuracy, the main advantage of the present work is that the methodology proposed in the present study is scalable from one context to the other.…”
Section: Resultsmentioning
confidence: 50%
See 1 more Smart Citation
“…In the case of a large sample size, the number of students at risk will be significantly lower, and hence, in such situations of highly imbalanced data, the present model may be quite useful. The highest prediction accuracy achieved in the present study is 95.45%, which is greater than most of the previous studies [12][13][14][15][16][17][18]. Along with the enhanced prediction accuracy, the main advantage of the present work is that the methodology proposed in the present study is scalable from one context to the other.…”
Section: Resultsmentioning
confidence: 50%
“…Further, Ghavidel et al [16] solved the problem of imbalanced data by using a combination of the SVM-SMOTE (an over-sampling technique) and Edited-Nearest-Neighbor (an under-sampling technique) while predicting disease mortality. Recently, Desiani et al [17] applied k-Nearest Neighbor (k-NN), Artificial Neural Network (ANN), and C4.5 to students" educational background records along with SMOTE to make the dataset balanced, and that balanced dataset increased the accuracy of prediction, and for k-NN the maximum achieved accuracy was 83.71%.…”
Section: Related Workmentioning
confidence: 99%
“…Total pixels of all image in segmentation prediction result would be included in a confusion matrix. Confusion matrix was applied to calculate accuracy, sensitivity, and specificity for the proposed method [31], [32]. The confusion matrix obtained by the blood vessel segmentation process was displayed in Table 3.…”
Section: Resultsmentioning
confidence: 99%
“…Desiani et al [48] are proposed a model based on a dataset of Universitas Sriwijaya with 2,934 records. The researchers tried to identify and solve the minority class labels (tightest 27%, and very tight 38.6%).…”
Section: Literature Reviewmentioning
confidence: 99%