Emilija Strelcenia scite author profile

2023

MAKE

Data augmentation is an important procedure in deep learning. GAN-based data augmentation can be utilized in many domains. For instance, in the credit card fraud domain, the imbalanced dataset problem is a major one as the number of credit card fraud cases is in the minority compared to legal payments. On the other hand, generative techniques are considered effective ways to rebalance the imbalanced class issue, as these techniques balance both minority and majority classes before the training. In a more recent period, Generative Adversarial Networks (GANs) are considered one of the most popular data generative techniques as they are used in big data settings. This research aims to present a survey on data augmentation using various GAN variants in the credit card fraud detection domain. In this survey, we offer a comprehensive summary of several peer-reviewed research papers on GAN synthetic generation techniques for fraud detection in the financial sector. In addition, this survey includes various solutions proposed by different researchers to balance imbalanced classes. In the end, this work concludes by pointing out the limitations of the most recent research articles and future research issues, and proposes solutions to address these problems.

Improving Classification Performance in Credit Card Fraud Detection by Using New Data Augmentation

2023

In many industrialized and developing nations, credit cards are one of the most widely used methods of payment for online transactions. Credit card invention has streamlined, facilitated, and enhanced internet transactions. It has, however, also given criminals more opportunities to commit fraud, which has raised the rate of fraud. Credit card fraud has a concerning global impact; many businesses and ordinary users have lost millions of US dollars as a result. Since there is a large number of transactions, many businesses and organizations rely heavily on applying machine learning techniques to automatically classify or identify fraudulent transactions. As the performance of machine learning techniques greatly depends on the quality of the training data, the imbalance in the data is not a trivial issue. In general, only a small percentage of fraudulent transactions are presented in the data. This greatly affects the performance of machine learning classifiers. In order to deal with the rarity of fraudulent occurrences, this paper investigates a variety of data augmentation techniques to address the imbalanced data problem and introduces a new data augmentation model, K-CGAN, for credit card fraud detection. A number of the main classification techniques are then used to evaluate the performance of the augmentation techniques. These results show that B-SMOTE, K-CGAN, and SMOTE have the highest Precision and Recall compared with other augmentation methods. Among those, K-CGAN has the highest F1 Score and Accuracy.

Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study

2023

BioMedInformatics

Breast cancer is among the most common cancers found in women, causing cancer-related deaths and making it a severe public health issue. Early prediction of breast cancer can increase the chances of survival and promote early medical treatment. Moreover, the accurate classification of benign cases can prevent cancer patients from undergoing unnecessary treatments. Therefore, the accurate and early diagnosis of breast cancer and the classification into benign or malignant classes are much-needed research topics. This paper presents an effective feature engineering method to extract and modify features from data and the effects on different classifiers using the Wisconsin Breast Cancer Diagnosis Dataset. We then use the feature to compare six popular machine-learning models for classification. The models compared were Logistic Regression, Random Forest, Decision Tree, K-Neighbors, Multi-Layer Perception (MLP), and XGBoost. The results showed that the Decision Tree model, when applied to the proposed feature engineering, was the best performing, achieving an average accuracy of 98.64%.

Generating Synthetic Data for Credit Card Fraud Detection Using GANs

2022

Deep learning-based classifiers for object classification and recognition have been utilized in various sectors. However according to research papers deep neural networks achieve better performance using balanced datasets than imbalanced ones. It's been observed that datasets are often imbalanced due to less fraud cases in production environments. Deep generative approaches, such as GANs have been applied as an efficient method to augment high-dimensional data.In this research study, the classifiers based on a Random Forest, Nearest Neighbor, Logistic Regression, MLP, Adaboost were trained utilizing our novel K-CGAN approach and compared using other oversampling approaches achieving higher F1 score performance metrics.Experiments demonstrate that the classifiers trained on the augmented set achieved far better performance than the same classifiers trained on the original data producing an effective fraud detection mechanism. Furthermore, this research demonstrates the problem with data imbalance and introduces a novel model that's able to generate high quality synthetic data.

Comparative Analysis of Machine Learning Algorithms using GANs through Credit Card Fraud Detection

2022

In more recent years, credit card fraudulent transactions became a major problem. These fraudulent transactions not only incur huge monetary losses to commercial banks and financial institutions, but also stress and trouble to the lives of customers. Furthermore, with the passage of time this issue is increasing and the monetary loss is expected to increase significantly. However, efficient fraud detecting and prevention measures can trim down the monetary loss due to financial fraud activities. Credit card fraud detection has gained much interest from academia. Generative Adversarial Networks (GANs) are an effective class of generative approaches that has been able to generate synthetic data to assist with the classification of credit card fraudulent activities. In this research study we're going to compare architectures of various GAN models which demonstrate the evolution of these models. It was observed that GANs have received much attention from researchers and also attained promising results in the field of credit card fraud detection.