Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Khemakhem, Sihem; Saïd, F.; Boujelbène, Younés

doi:10.1108/jm2-01-2017-0002

Cited by 54 publications

(39 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This model compared with other linear models showed better performance in terms of prediction accuracy due to the reducing the influence of irrelevant features. In (Khemakhem et al 2018), authors assessed credit risk using linear regression, SVM and neural networks. Their work compares performance indicators of the prediction methods before and after data balancing.…”

Section: Related Workmentioning

confidence: 99%

“…Thus, in our research, we are using a unique dataset of the credit registry, and we present all the necessary steps from data collection to prediction and evaluation. On this dataset, we train models using and comparing the most used machine-learning algorithms; additionally, we consider sampling strategy for data balancing, such as the approach in (Khemakhem et al 2018). Even though the dataset that we exploit in this paper gives an added value to this research and its results, the main drawback is that we did not compare the results with any other similar dataset, because it is impossible to obtain such datasets from neighboring or any other central bank.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Credit Risk Model Based on Central Bank Credit Registry Data

Doko¹,

Kalajdziski

Mishkovski

2021

JRFM

View full text Add to dashboard Cite

Data science and machine-learning techniques help banks to optimize enterprise operations, enhance risk analyses and gain competitive advantage. There is a vast amount of research in credit risk, but to our knowledge, none of them uses credit registry as a data source to model the probability of default for individual clients. The goal of this paper is to evaluate different machine-learning models to create accurate model for credit risk assessment using the data from the real credit registry dataset of the Central Bank of Republic of North Macedonia. We strongly believe that the model developed in this research will be an additional source of valuable information to commercial banks, by leveraging historical data for all the population of the country in all the commercial banks. Thus, in this research, we compare five machine-learning models to classify credit risk data, i.e., logistic regression, decision tree, random forest, support vector machines (SVM) and neural network. We evaluate the five models using different machine-learning metrics, and we propose a model based on credit registry data from the central bank with detailed methodology that can predict the credit risk based on credit history of the population in the country. Our results show that the best accuracy is achieved by using decision tree performing on imbalanced data with and without scaling, followed by random forest and linear regression.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Credit Risk Model Based on Central Bank Credit Registry Data

Doko¹,

Kalajdziski

Mishkovski

2021

JRFM

View full text Add to dashboard Cite

show abstract

“…(Some examples of images eliminated from the dataset) Hâlbuki derin öğrenme algoritmalarının uygulanabilmesi için çok sayıda anlamlı ve etiketli veriye gereksinim duyulmaktadır [41]. Ayrıca tercih edilen veri seti üzerinde, daha iyi bir sınıflandırma yapılabilmesi için veri sayısı az olan sınıflara ait veri sayısının arttırılması gerekmektedir [42][43][44]. Bu nedenle dengeli sınıf dağılımının sağlanabilmesi için veri arttırma ve veri boyutunun standart hale getirilmesine ihtiyaç duyulmaktadır.…”

Section: şEkil 3 Veri Setinden Elimine Edilen Bazı Görüntü öRnekleriunclassified

Arı hastalıklarının hibrit bir derin öğrenme yöntemi ile tespiti

Metlek

Kayaalp

2021

Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi

View full text Add to dashboard Cite

Theory and Methods: In the study, feature extraction methods, which are the strengths of deep learning, were operated from two different arms and aggressive changes in images were detected. In the classification process; Instead of Softmax classifier based on probability calculation, multi-layer feedback artificial neural network (MLFB-ANN) model has been used. Results:The success of the designed system has also been compared with the Softmax classifier. As a result of experimental studies, 93.07% success rate can be achieved with Softmax classifier for six different bee diseases on the same data set, while 95.04% success rate has been obtained with the developed system. Conclusion:In this study, a hybrid method based on deep learning methods was proposed for the classification of bee diseases and successful results were obtained.

show abstract

“…To tackle an imbalanced problem in credit scoring data, many studies employed resampling techniques, such as under-sampling and over-sampling [3][4][5][6][7][8]. The major disadvantage of resampling techniques is led to the overhead cost and the other consequent problems, e.g., 1) information may be lost using under-sampling techniques, 2) the final model may be overfitted using over-sampling techniques, 3) the original data distribution may be changed, and 4) the model is more complex and it has high computational cost.…”

Section: Introductionmentioning

confidence: 99%

A Novel Method for Credit Scoring Based on Cost-Sensitive Neural Network Ensemble

2021

View full text Add to dashboard Cite

Most existing studies on credit scoring adapted a concept of classifier ensemble for solving an imbalanced dataset. They apply resampling methods to generate multiple training subsets for constructing multiple base classifiers. However, this approach leads to several problems that degrade the classification performance, such as problems of information loss, model overfitting, and computational cost. Thus, we propose a novel ensemble approach for developing a credit scoring model based on a cost-sensitive neural network, called Cost-sensitive Neural Network Ensemble (CS-NNE). In the proposed approach, multiple class weights are adapted to original training data, enabling the multiple base neural networks to consider imbalanced classes. Following this approach, a high diversity of multiple base classifiers without consequent problems can be achieved. The approach's effectiveness is evaluated on five real-world credit datasets. Among them is a loan-requesting dataset provided by a financial institution in Thailand. The remaining datasets are publicly available and widely used by several existing studies. The experimental results showed that the proposed CS-NNE approach improves the predictive performance over a single neural network based on imbalanced credit datasets, e.g., Thai credit dataset, by achieving 1.36%, 15.67%, and 6.11% Area under the ROC Curve (AUC), Default Detection Rate (DDR), and G-Mean (GM), respectively, and achieving the best Misclassification Cost (MC). The proposed CS-NNE approach can effectively solve a class of imbalance problems and outperform many existing models. The prediction model can well compromise between classes of default (bad credit applicants) and non-default (good credit applicants), whereas existing approaches preferred a class of non-default over default loans (having high specificity and low DDR), resulting in NPL.

show abstract

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Cited by 54 publications

References 53 publications

Credit Risk Model Based on Central Bank Credit Registry Data

Credit Risk Model Based on Central Bank Credit Registry Data

Arı hastalıklarının hibrit bir derin öğrenme yöntemi ile tespiti

A Novel Method for Credit Scoring Based on Cost-Sensitive Neural Network Ensemble

Contact Info

Product

Resources

About