Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim

Nugraha, Ranu A.; Pardede, Hilman F.; Subekti, Agus

doi:10.48129/kjs.splml.19119

Cited by 3 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study classified the levels of cognitive impairment associated with Parkinson's disease by applying oversampling techniques to three datasets with three different IR values and found that GAN-based oversampling techniques showed better AUC and F1-score values than traditional techniques. Nugraha et al [37] used insurance fraud imbalance data and proposed CTGAN as an oversampling method, showing that over the application of 17 classification models, CTGAN presented a better performance (AUC, F1-score, precision, etc.) than ROS, SMOTE, and ADASYN.…”

Section: Discussionmentioning

confidence: 99%

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

Eom,

Byeon

2023

Mathematics

View full text Add to dashboard Cite

Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging to the minority class are likely to be misclassified. It is particularly important to overcome this issue when dealing with medical data because class imbalance inevitably arises due to incidence rates within medical datasets. This study adjusted the imbalance ratio (IR) within the National Biobank of Korea dataset “Epidemiologic data of Parkinson’s disease dementia patients” to values of 6.8 (raw data), 9, and 19 and compared four traditional oversampling methods with techniques using the conditional generative adversarial network (CGAN) and conditional tabular generative adversarial network (CTGAN). The results showed that when the classes were balanced with CGAN and CTGAN, they showed a better classification performance than the more traditional oversampling techniques based on the AUC and F1-score. We were able to expand the application scope of GAN, widely used in unstructured data, to structured data. We also offer a better solution for the imbalanced data problem and suggest future research directions.

show abstract

Section: Discussionmentioning

confidence: 99%

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

Eom,

Byeon

2023

Mathematics

View full text Add to dashboard Cite

show abstract

“…Real-world data, such as data related to fault detection [3], [4]; fraud detection [5], [6], and medical diagnosis [7]- [9], often have data imbalance problems. A dataset is called an imbalance if it does not represent the classified categories evenly [10].…”

Section: Introductionmentioning

confidence: 99%

“…GAN generates additional data for minority classes by oversampling with the Conditional Tabular GAN (CTGAN) architecture. The generator adjusts the tabular data input and receives supplementary information to produce samples under the specified class conditions [6]. The experimental results show that the proposed method performs better than other oversampling methods on several evaluation metrics: Accuracy, Precision score, F1 score, and AUC.…”

Section: Introductionmentioning

confidence: 99%

Improving Classification Performance on Imbalanced Medical Data using Generative Adversarial Network

Siska Rahmadani,

Agus Subekti,

Haris

2024

Jurnal Ilmu Komputer dan Informasi

View full text Add to dashboard Cite

In many real-world applications, the problem of data imbalance is a common challenge that significantly affects the performance of machine learning algorithms. Data imbalance means each target of classes is not balanced. This problem often appears in medical data, where the positive cases of a disease or condition are much fewer than the negative cases. In this paper, we propose to explore the oversampling-based Generative Adversarial Networks (GAN) method to improve the performance of the classification algorithm over imbalanced medical datasets. We expect that GAN will be able to learn the actual data distribution and generate synthetic samples that are similar to the original ones. We evaluate our proposed methods on several metrics: Recall, Precision, F1 score, AUC score, and FP rate. These metrics measure the ability of the classifier to correctly identify the minority class and reduce the false positives and false negatives. Our experimental results show that the application of GAN performs better than other methods in several metrics across datasets and can be used as an alternative method to improve the performance of the classification model on imbalanced medical data.

show abstract

Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach

Mosqueira-Rey,

Hernández-Pereira,

Bobes-Bascarán

et al. 2023

Neural Comput & Applic

View full text Add to dashboard Cite

Any machine learning (ML) model is highly dependent on the data it uses for learning, and this is even more important in the case of deep learning models. The problem is a data bottleneck, i.e. the difficulty in obtaining an adequate number of cases and quality data. Another issue is improving the learning process, which can be done by actively introducing experts into the learning loop, in what is known as human-in-the-loop (HITL) ML. We describe an ML model based on a neural network in which HITL techniques were used to resolve the data bottleneck problem for the treatment of pancreatic cancer. We first augmented the dataset using synthetic cases created by a generative adversarial network. We then launched an active learning (AL) process involving human experts as oracles to label both new cases and cases by the network found to be suspect. This AL process was carried out simultaneously with an interactive ML process in which feedback was obtained from humans in order to develop better synthetic cases for each iteration of training. We discuss the challenges involved in including humans in the learning process, especially in relation to human–computer interaction, which is acquiring great importance in building ML models and can condition the success of a HITL approach. This paper also discusses the methodological approach adopted to address these challenges.

show abstract

Oversampling based on generative adversarial networks to overcome imbalance data in predicting fraud insurance claim

Cited by 3 publications

References 18 publications

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique

Improving Classification Performance on Imbalanced Medical Data using Generative Adversarial Network

Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach

Contact Info

Product

Resources

About