Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets.
Mobile telematics is a relatively new innovation that involves collecting data on driving behavior using the internal sensors in a smartphone rather than from an in-vehicle data recorder. However, telematics data are usually not labeled, which makes extracting driving patterns from them very difficult. Therefore, unsupervised learning algorithms play an important role in this field. In addition, most current research is based on datasets developed in a laboratory or from site investigations and questionnaires, which are very different from real-world driving behaviors. To advance unsupervised learning techniques in this field, and to fill the gap in findings based on real-world data, we have developed an unsupervised pattern recognition framework for mobile telematics data. The framework comprises three main components: a self-organizing map, a nine-layers deep autoencoder, and partitive clustering algorithms. The SOM algorithm reduces the complexity of the data, the deep auto-encoder extracts the features, and the clustering algorithm groups driving events with similar patterns into behaviors. Further, given clustering with mobile telematics data is an under-researched area, we undertook an empirical comparison of five well-known clustering algorithms to determine the strengths and weaknesses of each method and which is best suited to categorizing driving styles. The study was conducted with a real-world insurance dataset containing 500,000 journeys by 2500 drivers, and the results were evaluated against three measures -Davis Boulding, Calinski Harabasz, and execution time. Overall, we find that k-means clustering and a self-organizing map were able to extract more accurate patterns than others. A statistical analysis of the 29 clusters produced by SOM and k-means, revealed 29 unique driving styles, all of which can be found in the transportation literature. The results from the study, with support from the corresponding literature review, demonstrate the efficacy of the presented framework in unsupervised settings. Additionally, the results provide a basis for developing a future risk analysis and automatic decision support system for usage-based insurance companies.
Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets -Australian and German -from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.
Mobile app development in recent years has resulted in new products and features to improve human life. Mobile telematics is one such development that encompasses multidisciplinary fields for transportation safety. The application of mobile telematics has been explored in many areas, such as insurance and road safety. However, to the best of our knowledge, its application in gender detection has not been explored. This paper proposes a Choquet fuzzy integral vertical bagging classifier that detects gender through mobile telematics. In this model, different random forest classifiers are trained by randomly generated features with rough set theory, and the top three classifiers are fused using the Choquet fuzzy integral. The model is implemented and evaluated on a real dataset. The empirical results indicate that the Choquet fuzzy integral vertical bagging classifier outperforms other classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.