Fraud Detection Using Large-scale Imbalance Dataset

Rubaidi, Zainab Saad; Ammar, Boulbaba Ben; Aouicha, Mohamed Ben

doi:10.1142/s0218213022500373

Cited by 6 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies have explored imbalanced datasets in the context of different fraudulent cases, utilizing various resampling techniques and evaluation metrics (Rubaidi et al, 2022;Chen et al, 2021;Li et al, 2021;Mrozek et al, 2020;Bauder et al, 2018). Among the techniques used for handling imbalanced data were Random Undersampling (RUS), Random Oversampling (ROS), SMOTE, Borderline-SMOTE, Adaptive Synthetic Sampling (ADASYN), and cost-sensitive learning.…”

Section: Handling Of Imbalanced Datamentioning

confidence: 99%

Machine Learning Approach in Predicting Fraudulent Job Advertisement

Mohd Hanif,

Maarop,

Kamaruddin

et al. 2024

IJARBSS

View full text Add to dashboard Cite

As the world population grows, the demand for workers increases, leading to a rise in online job advertisements to connect employers with potential employees on a national scale. However, this shift also brings the risk of falling victim to fraud. Reported commercial crimes in Malaysia saw a 15.3% increase in 2021, with fraud being the highest among them. Several studies have proposed Machine Learning models to classify genuine and fraudulent job advertisements, but the analysis of certain techniques remains limited. The paper aims to develop a predictive model for identifying fraudulent job advertisements using selected features from imbalanced and balanced datasets. The Employment Scam Aegean Dataset was utilized to build Machine Learning classification models using Logistic Regression, Support Vector Machine, Decision Tree, and Naïve Bayes algorithms. These models were combined with different vectorizers like Term Frequency-Inverse Document Frequency, Bag of Words, and Hash. The Decision Tree model with Bag of Words vectorizer on a balanced dataset outperformed other models, achieving an accuracy of 0.705, precision of 0.73, recall of 0.70, F1-score of 0.71, and Area Under Curve score of 0.68. This model shows promise in effectively identifying fraudulent job advertisements, safeguarding job seekers from scams in the online job market.

show abstract

Section: Handling Of Imbalanced Datamentioning

confidence: 99%

Machine Learning Approach in Predicting Fraudulent Job Advertisement

Mohd Hanif,

Maarop,

Kamaruddin

et al. 2024

IJARBSS

View full text Add to dashboard Cite

show abstract

“…The classi er's performance is enhanced by integrating multiple classi ers. Other studies have employed NearMiss undersampling to address imbalances in nancial crime datasets (Mqadi et al, 2021;Rubaidi et al, 2022). The studies found that machine learning algorithms performed very well using the NearMiss undersampling technique.…”

Section: Nearmissmentioning

confidence: 99%

Exploring Resampling Techniques in Credit Card Default Prediction

Lokanan

2024

Preprint

View full text Add to dashboard Cite

In the field of machine learning, the preparation of data is a pivotal step in optimizing model performance. This paper delves into the crucial role of data cleaning and transformation, with a particular emphasis on resampling techniques tailored for addressing imbalanced datasets. By emphasizing the significance of tailored data preparation methodologies, this study underscores the role of resampling techniques in optimizing model performance, especially when dealing with imbalanced datasets. Through an exploration of both undersampling and oversampling methods, the study delves into their nuanced impacts on classification performance and explores the potential trade-offs inherent in each approach. Focusing on the domain of credit card default prediction, the research leverages the UCI Credit Card dataset to provide a comprehensive analysis. The results demonstrate that NearMiss outperformed other undersampling techniques across all classifiers and evaluation metrics. Similarly, K-MeansSMOTE emerged as the top-performing oversampling technique across all classifiers and evaluation metrics. Among the techniques investigated in the study, K-MeansSMOTE oversampling yielded the highest performance accuracy. The findings from this paper enhance our understanding of the performance of different resampling techniques and contribute to the scholarship on handling imbalanced datasets. The results show the pros and cons of different resampling methods used with different machine learning algorithms. They also show how important customized methods are for getting accurate predictions. While offering valuable insights, the study acknowledges the necessity for further research to refine and generalize these techniques across diverse domains and real-world applications, thereby contributing to the broader landscape of machine learning methodologies.

show abstract

“…On the right is the decoder, which also has the two layers of the encoder, but between them there is an Encoder-Decoder Attention, which is used to help the decoder to focus on the relevant parts of the input sentence [57]. Training deep learning models [58] for fraud detection requires substantial amounts of labeled data, which can be a challenge due to the imbalance between normal and fraudulent instances [59]. Data preprocessing techniques, including resampling, oversampling, and undersampling, are employed to address this imbalance [60].…”

Section: Deep Learning Approaches To Fraud Detectionmentioning

confidence: 99%

Artificial Intelligence Techniques for Fraud Detection

Lai

2023

Preprint

View full text Add to dashboard Cite

In the wake of increasing digital fraud, this paper introduces an innovative application of Artificial Intelligence (AI) in detecting fraudulent activities across finance, healthcare, and e-commerce sectors. It presents a detailed analysis of machine learning methodologies, specifically focusing on the advantages of supervised, unsupervised, and deep learning techniques. The paper addresses the challenges such as data imbalance, model interpretability, and ethical implications in AI-based fraud detection. It also discusses the necessity of high-quality datasets and advocates for the integration of traditional and advanced machine learning methods to enhance accuracy and adaptability in fraud identification. However, it acknowledges the limitations including computational demands and overfitting risks. The study underscores the importance of collaborative efforts between AI experts and industry professionals to develop ethical, efficient, and reliable AI solutions for fraud detection.

show abstract

Fraud Detection Using Large-scale Imbalance Dataset

Cited by 6 publications

References 0 publications

Machine Learning Approach in Predicting Fraudulent Job Advertisement

Machine Learning Approach in Predicting Fraudulent Job Advertisement

Exploring Resampling Techniques in Credit Card Default Prediction

Artificial Intelligence Techniques for Fraud Detection

Contact Info

Product

Resources

About