The problem of preprocessing transaction data for supervised fraud classification is considered. It is impractical to present an entire series of transactions to a fraud detection system, partly because of the very high dimensionality of such data but also because of the heterogeneity of the transactions. Hence, a framework for transaction aggregation is considered and its effectiveness is evaluated against transaction-level detection, using a variety of classification methods and a realistic cost-based performance measure. These methods are applied in two case studies using real data. Transaction aggregation is found to be advantageous in many but not all circumstances. Also, the length of the aggregation period has a large impact upon performance. Aggregation seems particularly effective when a random forest is used for classification. Moreover, random forests were found to perform better than other classification methods, including SVMs, logistic regression and KNN. Aggregation also has the advantage of not requiring precisely labeled data and may be more robust to the effects of population drift.
Learning the network structure of a large graph is computationally demanding,
and dynamically monitoring the network over time for any changes in structure
threatens to be more challenging still. This paper presents a two-stage method
for anomaly detection in dynamic graphs: the first stage uses simple, conjugate
Bayesian models for discrete time counting processes to track the pairwise
links of all nodes in the graph to assess normality of behavior; the second
stage applies standard network inference tools on a greatly reduced subset of
potentially anomalous nodes. The utility of the method is demonstrated on
simulated and real data sets.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS329 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.