Money laundering is the crucial mechanism utilized by criminals to inject proceeds of crime into the financial system. The primary responsibility of the detection of suspicious activity related to money laundering is with the financial institutions. Most of the current systems in these institutions are rule-based and ineffective (over 90 % false positives). The available data science-based anti-money laundering (AML) models to replace the existing rule-based systems work on customer relationship management (CRM) features and time characteristics of transaction behaviour. Due to thousands of possible account features, customer features, and their combinations, it is challenging to perform feature engineering to achieve reasonable accuracy. Aiming to improve the detection performance of suspicious transaction monitoring systems for AML systems, in this article, we introduce a novel feature set based on time-frequency analysis, that uses 2-D representations of financial transactions. Random forest is utilized as a machine learning method, and simulated annealing is adopted for hyperparameter tuning. The designed algorithm is tested on real banking data, proving the results' efficacy in practically relevant environments. It is shown that the timefrequency characteristics are discriminatory features for suspicious and non-suspicious entities. Therefore, these features substantially improve the area under curve results (over 1%) of the existing data science-based transaction monitoring systems. Using time-frequency features alone, a false positive rate of 14.9% has been achieved, with an F-score of 59.05%. When combined with transaction and CRM features, the false positive rate is 11.85%, and the F-Score is improved to 74.06%.
INDEX TERMSAnomaly detection, anti-money laundering, compliance, random forest algorithm, timefrequency analysis, transaction monitoring.
We propose a model of the driver perception suited for microscopic, agent-based traffic simulations. The model includes both top-down and bottom-up perception, and takes into account the limited amount of perceptive resource which gain access to short-term memory. The driving task is split into sub-tasks, which can be activated in parallel (e.g. car following and crossroads passing). Perceived entities (percepts) as well as subtasks are ranked with respect to their subjective value, and due to the bounded perception, only the more "valuable" percepts are sent to the decision module of the cognitive model. The competition among percepts to gain access to the short-term memory simulates attentional processes. A computational implementation of the model is proposed for the driver, using agent-based modeling. It is implemented in a traffic simulation environment and allows the driver-agent to manage the conflicts and the longitudinal space in the middle of the crossroads. This way, we improve the realism of the simulation. Furthermore, this model can lead to a new way of identifying and explaining near accidents. We illustrate some benefits for a microscopic traffic simulation at crossroads in two situations. The first
Missing data is a common problem for data clustering quality. Most real-life datasets have missing data, which in turn has some effect on clustering tasks. This chapter investigates the appropriate data treatment methods for varying missing data scarcity distributions including gamma, Gaussian, and beta distributions. The analyzed data imputation methods include mean, hot-deck, regression, k-nearest neighbor, expectation maximization, and multiple imputation. To reveal the proper methods to deal with missing data, data mining tasks such as clustering is utilized for evaluation. With the experimental studies, this chapter identifies the correlation between missing data imputation methods and missing data distributions for clustering tasks. The results of the experiments indicated that expectation maximization and k-nearest neighbor methods provide best results for varying missing data scarcity distributions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.