Handling Imbalance Data in Classification Model with Nominal Predictors

Fithriasari, Kartika; Hariastuti, Iswari; Wening, Kinanthi Sukma

doi:10.12962/j24775401.v6i1.6643

Cited by 7 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In this study, the data mining process involves the process of balancing data and sharing training data and test data. Data balancing is a process to equalize the amount of data in each class to improve the accuracy of the system during the learning process [19]. In this stage, learning the classification model was carried out, namely grouping TB data into pulmonary and extrapulmonary class categories.…”

Section: Data Mining Processmentioning

confidence: 99%

“…where, 𝑋𝑖=vector of features in the minority class 𝑋knn=k-nearest neighbors for 𝑋𝑖 𝛿=random number between 0 to 1 After balancing the data, the next step is to divide the data set into k to n partitions.Splitting this data is known as K-Fold Cross Validation and is a popular method of solving statistical data where the data is divided into two subsets, namely training data for the learning process and test data for validation or assessment used to assess performance models, methods, or algorithms [19]. K-Fold cross validation can be selected based on dataset size.…”

Section: Data Mining Processmentioning

confidence: 99%

“…The amount of data in the pulmonary class is 596, while the extra lung is 389. So a method is needed to balance the data [19]. This method works by replicating the data randomly by choosing KNN as the determinant, then the minority data set will be balanced with the majority data.…”

Section: Attributementioning

confidence: 99%

See 2 more Smart Citations

Utilizing LSTM and K-NN for Anatomical Localization of Tuberculosis: A Solution for Incomplete Data

Rochman,

Miswanto,

Suprajitno

et al. 2023

MMEP

View full text Add to dashboard Cite

Tuberculosis (TB) is a prevalent lung disease that significantly contributes to mortality rates, with an estimated 98,000 fatalities observed in Indonesia alone. TB can be classified into two categories based on its anatomical location: pulmonary, when detected in lung parenchyma tissue, and extrapulmonary, when identified in organs outside the lungs. Current diagnostic procedures necessitate numerous laboratory tests and manual assessments, which are both time-consuming and susceptible to data incompleteness, thereby potentially influencing the diagnostic outcomes. This necessitates the development of a rapid and accurate classification system for the anatomical location of TB, which could aid medical professionals in diagnosis. In this study, we propose a novel classification system that utilizes the K-Nearest Neighbors (K-NN) algorithm to handle missing data, and the Synthetic Minority Over-sampling Technique (SMOTE) for data balancing. For the classification of pulmonary and extrapulmonary TB, the study employs the Long Short-Term Memory (LSTM) method, the performance of which is compared with other models, namely Naï ve Bayes, Support Vector Machine (SVM), and Backpropagation. Although all four models demonstrated high levels of accuracy, the LSTM method outperformed the others, achieving 100% accuracy compared to Naï ve Bayes (99.4%), SVM (99.3%), and Backpropagation (99.7%). These results were obtained after implementing imputation and class balancing stages, and optimizing LSTM features such as the tanh activation function, learning rate of 0.01, 100 LSTM units, and the ADAM optimizer. The proposed system thus presents an effective solution for the rapid and accurate classification of TB based on anatomical location.

show abstract

Section: Data Mining Processmentioning

confidence: 99%

Section: Data Mining Processmentioning

confidence: 99%

See 1 more Smart Citation

Utilizing LSTM and K-NN for Anatomical Localization of Tuberculosis: A Solution for Incomplete Data

Rochman,

Miswanto,

Suprajitno

et al. 2023

MMEP

View full text Add to dashboard Cite

show abstract

“…Pada tahap preprocessing data juga terdapat transformasi data yang meliputi generalisasi data, smoothing, normalisasi dan konstruksi atribut. Dan juga perlu dilakukan penanganan imbalanced data, karena imbalanced class akan menyebabkan akurasi menjadi tidak akurat [7]. Akurasi algoritma dapat ditingkatkan setelah data preprocessing [8].…”

Section: Pendahuluanunclassified

Analisis Sentimen Terhadap Game Genshin Impact Menggunakan Bert

Kusnadi¹,

Yusuf²,

Andriantony³

et al. 2021

rabit

View full text Add to dashboard Cite

Dengan pesatnya peningkatan jasa internet di jaringan sosial, ada banyaknya informasi dalam jumlah besar terus-menerus dihasilkan secara langsung di saat yang sama. Akhir-akhir ini, analisis sentimen dengan menggunakan ulasan dan pesan telah menjadi topik penelitian yang populer dibicarakan di bidang Natural Langauage Processing. Selama bertahun-tahun, permainan online telah menjadi suatu aktivitas yang tidak bisa dipisahkan dari Sebagian besar orang, terlebih karena gangguan ekonomi yang disebabkan oleh virus Covid-19. Genshin Impact adalah salah satu permainan terkenal yang dikembangkan oleh miHoYo. Penelitian ini berfokus pada analisis sentimen dengan tujuan mengetahui apakah ulasan terpercaya yang dikumpulkan dari Google Play Store memiliki sentimen netral, baik atau sentimen buruk sehingga dapat membantu pengembangan permainan kedepannya. Diperlukan proses klasifikasi analisis sentimen otomatis untuk mengurangi kesalahan yang disebabkan oleh sumber daya manusia. Meskipun demikian, sangat jarang ditemukan studi yang membahas feature extraction dan deep learning models yang sesuai dengan kasus ini, terutama dalam bisnis permainan. Tahap proses penelitian ini adalah pengekstraksian data melalui Google Play Store, dan menggunakan Bidirectional Encoder Representations from Transformers (BERT) sebagai model kecerdasan buatan.

show abstract

“…Fithrasari et al on handling imbalance data in classification model with nominal predictors in 2020, studied handling imbalanced data in classification models with nominal predictors [16]. They used Survei Kinerja dan Akuntabilitas Kependudukan Keluarga Berencana dan Pembangunan Keluarga (SKAP KKBPK) data Jawa Timur Province in 2018.…”

mentioning

confidence: 99%

Model optimisation of class imbalanced learning using ensemble classifier on over-sampling data

Kurniawati

Prabowo

2022

IJ-AI

View full text Add to dashboard Cite

<span lang="EN-US">Data imbalance is one of the problems in the application of machine learning and data mining. Often this data imbalance occurs in the most essential and needed case entities. Two approaches to overcome this problem are the data level approach and the algorithm approach. This study aims to get the best model using the pap smear dataset that combined data levels with an algorithmic approach to solve data imbalanced. The laboratory data mostly have few data and imbalance. Almost in every case, the minor entities are the most important and needed. Over-sampling as a data level approach used in this study is the synthetic minority oversampling technique-nominal (SMOTE-N) and adaptive synthetic-nominal (ADASYN-N) algorithms. The algorithm approach used in this study is the ensemble classifier using AdaBoost and bagging with the classification and regression tree (CART) as learner-based. The best model obtained from the experimental results in accuracy, precision, recall, and f-measure using ADASYN-N and AdaBoost-CART.</span>

show abstract

Handling Imbalance Data in Classification Model with Nominal Predictors

Cited by 7 publications

References 5 publications

Utilizing LSTM and K-NN for Anatomical Localization of Tuberculosis: A Solution for Incomplete Data

Utilizing LSTM and K-NN for Anatomical Localization of Tuberculosis: A Solution for Incomplete Data

Analisis Sentimen Terhadap Game Genshin Impact Menggunakan Bert

Model optimisation of class imbalanced learning using ensemble classifier on over-sampling data

Contact Info

Product

Resources

About