Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection

Padmaja,; Dhulipalla,; Bapi,; Krishna, .

doi:10.1109/adcom.2007.74

Cited by 49 publications

(19 citation statements)

References 11 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To solve the problem of highly unbalanced database and overlapping in [7] propose a new approach called elimination of points at the end of the border and hybrid sampling technique. The concept of k-neighbors is used as a cleaning method to remove data points from the end of the border in minority regions.…”

Section: Related Workmentioning

confidence: 99%

Outlier Detection Applying an Innovative User Transaction Modeling with Automatic Explanation

Perez

Lavalle

2011

2011 IEEE Electronics, Robotics and Automotive Mechanics Conference

View full text Add to dashboard Cite

We present a method to detect outlier or exceptional transactions records applying an innovative user modeling. We use a large financial database to validate our method. Our method has two stages. The first stage is for user transaction modeling and it obtains user behavior according to historic transactions based on categorical or numerical attributes. The second stage is the monitoring where a new transaction is compared against the corresponding user model, in order to determine if this transaction is unusual (no standard, fraudulent or suspicious). (e.g. transaction normal, abnormal, suspicious, etc.). And also provides the percentage of ownership to them. According to the experiments conducted with a very large financial database, encouraging results were observed in the field of applied Business Intelligence, in particular to the financial frauds detection and in general to the outlier detection area. The novelty of this method is that it provides to the user with an automatic explanation about the exception level of the new transaction

show abstract

Section: Related Workmentioning

confidence: 99%

Outlier Detection Applying an Innovative User Transaction Modeling with Automatic Explanation

Perez

Lavalle

2011

2011 IEEE Electronics, Robotics and Automotive Mechanics Conference

View full text Add to dashboard Cite

show abstract

“…First, fraud is a rare event because the legitimate claims almost always outnumber the fraudulent ones. For instance, more than 80% of the papers reviewed in Phua et al () have skewed data with less than 30% fraud. The sparsity of the fraud data can be addressed with methods such as non‐negative matrix factorisation, singular value decomposition and principal component analysis (Zhu et al , ).…”

Section: Medical Claims Datamentioning

confidence: 99%

“…Despite the wide adoption of fraud detection methods in these domains, the level of attention given to medical fraud assessment has been relatively limited (Phua et al , ). Some of the aforementioned methods are applicable for detection of fraudulent medical claims.…”

Section: Introductionmentioning

confidence: 99%

Statistical Medical Fraud Assessment: Exposition to an Emerging Field

Ekin

Ieva

Ruggeri

et al. 2018

Int Statistical Rev

View full text Add to dashboard Cite

Health care expenditures constitute a significant portion of governmental budgets. The percentage of fraud, waste and abuse within that spending has increased over years. This paper introduces the emerging area of statistical medical fraud assessment, which becomes crucial to handle the increasing size and complexity of the medical programmes. An overview of fraud types and detection is followed by the description of medical claims data. The utilisation of sampling, overpayment estimation and data mining methods in medical fraud assessment are presented. Recent unsupervised methods are illustrated with real world data. Finally, the paper introduces potential future research areas such as integrated decision making approaches and Bayesian methods and concludes with an overall discussion. The main goal of this exposition is to increase awareness about this important area among a broader audience of statisticians.

show abstract

“…The predictions of the majority class have a high possibility to get good performance, whereas the predictions of minority classes generally have poor performance. Class imbalance is prevalent in many real-world applications [12,40,41] such as bioinformatics, anomaly detection, intrusion detection, fraud detection and especially in medical diagnosis. These applications usually focus on the minority class.…”

Section: Introductionmentioning

confidence: 99%

Cluster-based sampling of multiclass imbalanced data

Prachuabsupakij

Soonthornphisaj

2014

IDA

View full text Add to dashboard Cite

The aim of this paper is to improve the classification performance based on the multiclass imbalanced datasets. In this paper, we introduce a new resampling approach based on Clustering with sampling for Multiclass Imbalanced classification using Ensemble (C-MIEN). C-MIEN uses the clustering approach to create a new training set for each cluster. The new training sets consist of the new label of instances with similar characteristics. This step is applied to reduce the number of classes then the complexity problem can be easily solved by C-MIEN. After that, we apply two resampling techniques (oversampling and undersampling) to rebalance the class distribution. Finally, the class distribution of each training set is balanced and ensemble approaches are used to combine the models obtained with the proposed method through majority vote. Moreover, we carefully design the experiments and analyze the behavior of C-MIEN with different parameters (imbalance ratio and number of classifiers). The experimental results show that C-MIEN achieved higher performance than state-of-the-art methods. This paper is concerned with improving the classification performance on multiclass imbalanced dataset, which is even more complicated. Moreover, the higher degree of class imbalance may increase the difficulty of multiclass classification. Solutions for two-class problems are not directly applicable to multiclass cases. One of the famous methods is decomposition technique, which is decompose the multiclass dataset into a series of binary classification problems and then use a two-class learner for a classification task [13,18,32,34] such as One-Against-One (OAO) [60], One-Against-All (OAA) [7]. Several decomposition methods use ensemble approach to combine the models obtained from the binary class classifiers. However, using decomposition with sampling technique is not practical for this problem because it is time consuming. Moreover, in case OAA, results of each class label assignment are not comparable due to the decision can be made differently for different classes [54]. Hence, reducing the number of classes and comparing labels becomes a key issue for applying the resampling technique in multiclass cases.In this paper, we develop a resampling algorithm for multiclass imbalance problem based on clustering approach namely C-MIEN. Firstly, k-means is used to split the set of instances into two clusters. For each cluster, hybird sampling methods are used. Then, final training sets (classes are balanced) are used to build an emsemble. Finally, the prediction is obtained by combining the results from both clusters through a majority vote. C-MIEN is an extension of our previous works [42][43][44] that focused on different classifiers in the classification part. In our previous works [42,43], we did not apply ensemble in the classification part. The re-balancing process was different from this paper. Moreover, we carefully design the experiments and analyze the behavior of C-MIEN with different parameters (imbalance ratio and number of c...

show abstract

Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection

Cited by 49 publications

References 11 publications

Outlier Detection Applying an Innovative User Transaction Modeling with Automatic Explanation

Outlier Detection Applying an Innovative User Transaction Modeling with Automatic Explanation

Statistical Medical Fraud Assessment: Exposition to an Emerging Field

Cluster-based sampling of multiclass imbalanced data

Contact Info

Product

Resources

About