The last years have seen the development of many credit scoring models for assessing the creditworthiness of loan applicants. Traditional credit scoring methodology has involved the use of statistical and mathematical programming techniques such as discriminant analysis, linear and logistic regression, linear and quadratic programming, or decision trees. However, the importance of credit grant decisions for financial institutions has caused growing interest in using a variety of computational intelligence techniques. This paper concentrates on evolutionary computing, which is viewed as one of the most promising paradigms of computational intelligence. Taking into account the synergistic relationship between the communities of Economics and Computer Science, the aim of this paper is to summarize the most recent developments in the application of evolutionary algorithms to credit scoring by means of a thorough review of scientific articles published during the period 2000-2012.
In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison to the class of non-defaulters is a very common situation, but it has still received little attention. The present paper investigates the suitability and performance of several resampling techniques when applied in conjunction with statistical and artificial intelligence prediction models over five real-world credit data sets, which have artificially been modified to derive different imbalance ratios (proportion of defaulters and non-defaulters examples). Experimental results demonstrate that the use of resampling methods consistently improves the performance given by the original imbalanced data. Besides, it is also important to note that in general, over-sampling techniques perform better than any under-sampling approach.
Credit risk and corporate bankruptcy prediction has widely been studied as a binary classification problem using both advanced statistical and machine learning models. Ensembles of classifiers have demonstrated their effectiveness for various applications in finance using data sets that are often characterized by imperfections such as irrelevant features, skewed classes, data set shift, and missing and noisy data. However, there are other corruptions in the data that might hinder the prediction performance mainly on the default or bankrupt (positive) cases, where the misclassification costs are typically much higher than those associated to the non-default or non-bankrupt (negative) class. Here we characterize the complexity of 14 real-life financial databases based on the different types of positive samples. The objective is to gain some insight into the potential links between the performance of classifier ensembles (BAGGING, AdaBoost, random subspace, DECORATE, rotation forest, random forest, and stochastic gradient boosting) and the positive sample types. Experimental results reveal that the performance of the ensembles indeed depends on the prevalent type of positive samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.