Oversampling Method To Handling Imbalanced Datasets Problem In Binary Logistic Regression Algorithm

Ustyannie, Windyaning; Suprapto, Suprapto

doi:10.22146/ijccs.37415

Cited by 8 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Hasil penelitian ini, metode RWO-sampling dengan pendekatan H a l a m a n | 33 replikasi secara random menunjukkan akurasi yang lebih baik dibandingkan dengan metode RWOsampling dengan pendekatan roulette dan ROS. Untuk pengujian masalah underfitting dalam regresi logistik menunjukkan bahwa metode oversampling lebih baik daripada non-oversampling dengan kenaikan nilai akurasi mencapai rata-rata 2,3% dari setiap data set (Ustyannie & Suprapto, 2020).…”

Section: Penelitian Terdahuluunclassified

Model Klasifikasi Pada Seleksi Mahasiswa Baru Penerima KIP Kuliah Menggunakan Regresi Logistik Biner

Susetyoko¹,

Yuwono

Purwantini

2022

JIP

View full text Add to dashboard Cite

Seleksi mahasiswa baru penerima Kartu Indonesia Pintar Kuliah (KIP Kuliah) dilakukan oleh setiap institusi untuk memilih mahasiswa yang benar-benar memiliki potensi akademik yang baik dan keterbatasan ekonomi. Pada penelitian ini menggunakan regresi logistik biner sebagai model klasifikasi. Data hasil preprocessing dibagi menjadi data training dan data testing. Beberapa model regresi logistik dibandingkan kinerjanya, baik yang menggunakan data asli, data hasil normalisasi, data undersampling, data oversampling, serta data hasil kombinasi oversampling dan undersampling. Evaluasi model berdasarkan signifikansi parameter di dalam model dan kinerja klasifikasi dari matriks konfusi. Dari perbandingkan tujuh model regresi logistik, model yang terbaik adalah model yang menggunakan data asli dengan rerata F1 Score 92,40%, rerata recall sebesar 87,93%, accuracy sebesar 88,01%, precision sebesar 97,92%, dan AUC sebesar 84,6%.

show abstract

Section: Penelitian Terdahuluunclassified

Model Klasifikasi Pada Seleksi Mahasiswa Baru Penerima KIP Kuliah Menggunakan Regresi Logistik Biner

Susetyoko¹,

Yuwono

Purwantini

2022

JIP

View full text Add to dashboard Cite

show abstract

“…Scenario 3 is testing the model using a training and validation data with oversampling technique, by generalizing the amount of data in each class based on the data in the most populated class . The assumption is that rather than discarding important data, it is better to duplicate the data in the least class to balance the amount of data in the most class [15]. So based on Table 4, the most populated class is the Cyber Physical System class with 459 rows, by applying the oversampling technique it produced a total of 1836 rows with 459 rows in each class.…”

Section: 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃mentioning

confidence: 99%

Analysis of Expertise Group Using The Fuzzy K-NN Classification Algorithm (Case Study: School of Computing Telkom University)

Kusuma

Kurniati²,

Karo³

2022

Jur. Ris. Kom.

View full text Add to dashboard Cite

The School of Computing at Telkom University has four Expertise Groups that defines the lectures taken by students. Deciding the Expertise Group, will be influential in deciding elective courses and raising the topic of the Final Project. There are many students who are still having difficulty in deciding the Expertise Group and finally only decide based on the most popular Expertise Group without seeing their potential and abilities. The impact of wrong decision of the Expertise Group are delays in graduation time. It will then affect accreditation of study program and university rank, especially in the timely graduation indicator. Therefore, it is necessary to have a system that can predict the decision of the Expertise Group for the School of Computing students based on their academic scores. In this study, prediction using the Fuzzy K-Nearest Neighbor classification algorithm was chosen because it can determine the class based on the nearest neighbor and consider ambiguous data because of the weighting value in each class. There are five tests carried out to get the best model, namely (1) examine the best split training and validation data, (2) examine the best K value, (3) compare Fuzzy K-Nearest Neighbor with Naïve Bayes and Decision Tree (C4.5) which is a commonly used classification algorithm, (4) examine the values of accuracy, precision, recall, f1-score, and (5) examine the values of accuracy using Cross-Validation method. The result is that the model made using Fuzzy K-Nearest Neighbor has an accuracy value of 72% in the case of imbalance data, 62% in the case of applying the undersampling technique, and 56% in the case of applying oversampling. Based on experiments with the other two algorithms, it was found that compared to the other two algorithms, the Fuzzy K-Nearest Neighbor has a higher accuracy value in the case of imbalance data and the case of applying to undersampling, but it has a lower accuracy in the case of applying oversampling, due to the lack of Fuzzy K-Nearest Neighbor in handling small minority data variations.

show abstract

“…Klasifikasi merupakan sebuah proses untuk menemukan sebuah model yang menjelaskan dan membedakan konsep atau kelas data dengan tujuan memperkirakan kelas dari suatu objek yang kelasnya tidak diketahui (Tan et al, 2006). Naive Bayes merupakan metode klasifikasi yang sering digunakan karena proses algoritmanya yang lebih cepat dan mudah serta robust terhadap data pencilan (Prasetyo, 2012).…”

Section: Pendahuluanunclassified

“…Algoritma Naive Bayes berakar pada teorema Bayes. Teorema Bayes merupakan teorema yang mengacu pada konsep probabilitas bersyarat (Tan et al, 2006). Metode ini merupakan pendekatan statistik untuk melakukan inferensi induksi pada persoalan klasifikasi.…”

Section: Naive Bayesunclassified

PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal)

Fitriani

Yasin

Tarno

2021

J.Gauss

View full text Add to dashboard Cite

The Family Planning Program (KB) launched by the Government of Indonesia to address the problem of population control does not always produce the desired program results. In 2017, there were 7 users of the IUD contraceptive type of contraceptive who failed from 1,102 new IUD users in Kendal Regency so that the ratio of success and failure to the IUD KB program when compared to users of the new IUD KB is 0.64%: 99.36% . The ratio of success and failure of family planning programs which tend to be unbalanced makes it difficult to predict. One of the handling imbalanced data is oversampling, for example using Random Oversampling (ROS). Naive Bayes is used for classification because it’s easy and efficient learning model. The data in this study used 14 independent variables and 1 dependent variable. The results of this study indicate that the G-mean of Naive Bayes is less than 60%. The G-mean of ROS-Naive Bayes is 96.6%. It can be concluded that in this research, the ROS-Naive Bayes method is better than the Naive Bayes method for detecting the success status of IUD family planning in Kendal Regency. Keywords: Naive Bayes, Random Oversampling, G-mean

show abstract

Oversampling Method To Handling Imbalanced Datasets Problem In Binary Logistic Regression Algorithm

Cited by 8 publications

References 7 publications

Model Klasifikasi Pada Seleksi Mahasiswa Baru Penerima KIP Kuliah Menggunakan Regresi Logistik Biner

Model Klasifikasi Pada Seleksi Mahasiswa Baru Penerima KIP Kuliah Menggunakan Regresi Logistik Biner

Analysis of Expertise Group Using The Fuzzy K-NN Classification Algorithm (Case Study: School of Computing Telkom University)

PENANGANAN KLASIFIKASI KELAS DATA TIDAK SEIMBANG DENGAN RANDOM OVERSAMPLING PADA NAIVE BAYES (Studi Kasus: Status Peserta KB IUD di Kabupaten Kendal)

Contact Info

Product

Resources

About