Sihem Khemakhem scite author profile

Purpose Credit scoring datasets are generally unbalanced. The number of repaid loans is higher than that of defaulted ones. Therefore, the classification of these data is biased toward the majority class, which practically means that it tends to attribute a mistaken “good borrower” status even to “very risky borrowers”. In addition to the use of statistics and machine learning classifiers, this paper aims to explore the relevance and performance of sampling models combined with statistical prediction and artificial intelligence techniques to predict and quantify the default probability based on real-world credit data. Design/methodology/approach A real database from a Tunisian commercial bank was used and unbalanced data issues were addressed by the random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). Performance was evaluated in terms of the confusion matrix and the receiver operating characteristic curve. Findings The results indicated that the combination of intelligent and statistical techniques and re-sampling approaches are promising for the default rate management and provide accurate credit risk estimates. Originality/value This paper empirically investigates the effectiveness of ROS and SMOTE in combination with logistic regression, artificial neural networks and support vector machines. The authors address the role of sampling strategies in the Tunisian credit market and its impact on credit risk. These sampling strategies may help financial institutions to reduce the erroneous classification costs in comparison with the unbalanced original data and may serve as a means for improving the bank’s performance and competitiveness.

show abstract

Predicting credit risk on the basis of financial and non-financial variables and data mining

Khemakhem

Boujelbène

2018

RAF

View full text Add to dashboard Cite

Purpose Data mining for predicting credit risk is a beneficial tool for financial institutions to evaluate the financial health of companies. However, the ubiquity of selecting parameters and the presence of unbalanced data sets is a very typical problem of this technique. This study aims to provide a new method for evaluating credit risk, taking into account not only financial and non-financial variables, but also the class imbalance. Design/methodology/approach The most significant financial and non-financial variables were determined to build a credit scoring model and identify the creditworthiness of companies. Moreover, the Synthetic Minority Oversampling Technique was used to solve the problem of class imbalance and improve the performance of the classifier. The artificial neural networks and decision trees were designed to predict default risk. Findings Results showed that profitability ratios, repayment capacity, solvency, duration of a credit report, guarantees, size of the company, loan number, ownership structure and the corporate banking relationship duration turned out to be the key factors in predicting default. Also, both algorithms were found to be highly sensitive to class imbalance. However, with balanced data, the decision trees displayed higher predictive accuracy for the assessment of credit risk than artificial neural networks. Originality/value Classification results depend on the appropriateness of data characteristics and the appropriate analysis algorithm for data sets. The selection of financial and non-financial variables, as well as the resolution of class imbalance allows companies to assess their credit risk successfully.

show abstract

Support vector machines for credit risk assessment with imbalanced datasets

Khemakhem

Boujelbène

2018

IJDMMM

View full text Add to dashboard Cite

Support vector machines for credit risk assessment with imbalanced datasets

Khemakhem

Boujelbène

2018

IJDMMM

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sihem Khemakhem

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Predicting credit risk on the basis of financial and non-financial variables and data mining

Support vector machines for credit risk assessment with imbalanced datasets

Support vector machines for credit risk assessment with imbalanced datasets

Contact Info

Product

Resources

About