Application scorecards allow to assess the creditworthiness of loan applicants and decide on acceptance. The accuracy of scorecards is of crucial importance for minimizing bad debt loss and maximizing returns. In this paper, we extend upon prior benchmarking studies that experimentally compare the performance of classification techniques to discriminate between good and bad applications. We evaluate a range of cost-sensitive learning methods in terms of their ability to boost the profitability of scorecards. These methods allow to take into account the variable misclassification costs that are involved in rejecting good loan applications and accepting bad loan applications.An approach is proposed to estimate these misclassification costs, and various approaches to handle missing credit bureau scores are evaluated. The results of a case study involving a Romanian nonbanking financial institution (NBFI) indicate that cost-sensitive learning complements the existing state-of-the-art scorecard of the NBFI. The best performing cost-sensitive models are found to increase profitability across the three business channels, with a single-digit improvement for two of the channels and a double-digit increase for the other one. The result is partly explained by the default rate, which is higher for this latter channel and therefore o↵ers greater potential for improving profitability.
Probability of default estimation via machine learning on historical data is widely studied in credit risk modeling. In this work, we investigated the use of machine learning for a finergrained risk estimation task, namely spot factoring. Here, the goal is to estimate the likelihood that an invoice will be paid in an acceptable timeframe. In this case, risk is more related to the overdueness of an invoice. Based on this observation, we investigate three possible machine learning tasks for estimating this risk: binary classification for a predetermined overdue days cutoff; regression of the overdue days; and learning-to-rank which learns to optimize the risk-related ranking for the full range of instances. We model and evaluate these tasks using real-life spot factoring data. Finally, we perform a profit-driven evaluation that shows that regression models can lead to higher profits and better spread the risk than classification and ranking models for spot factoring.
Many companies, such as credit granting companies, have to decide on granting or denying customer or invoice loans on a daily basis. Increasingly, machine learning is used to learn probability-of-default models from previously granted cases and, thus, whether the outcome was positive or negative for the company, i.e. whether the client paid back or defaulted. However, as the outcome can only be observed for the granted cases, the data inherently has sample selection bias and caution should be taken when applying the probability-of-default model to the full through-the-door population. In reject inference, this problem is studied with respect to whether using the unlabeled rejected instances can help improve a classifier that is only trained on granted instances, e.g. using semi-supervised learning. In contrast, we investigate under what circumstances a model trained on the granted instances, with known outcome, can be used on all possible instances. For this, we believe a model should indicate when it cannot reliably predict the outcome. That is, it should refrain from making predictions on instances unlike those on which it was trained. If not, the credit granting company would expose itself to great risk, and experts could lose their trust in the predictions. We discuss similarities and differences of this problem compared to novelty detection, classification with a reject option and reject inference. We compare a number of methods that combine novelty detection with classification, with decent results even for two-stage methods and especially when using data of existing instances with unknown outcome.
Many advanced solving algorithms for constraint programming problems are highly configurable. The research area of algorithm configuration investigates ways of automatically configuring these solvers in the best manner possible. In this paper, we specifically focus on algorithm configuration in which the objective is to decrease the time it takes the solver to find an optimal solution. In this setting, adaptive capping is a popular technique which reduces the overall runtime of the search for good configurations by adaptively setting the solver's timeout to the best runtime found so far. Additionally, sequential model-based optimization (SMBO)-in which one iteratively learns a surrogate model that can predict the runtime of unseen configurations-has proven to be a successful paradigm. Unfortunately, adaptive capping and SMBO have thus far remained incompatible, as in adaptive capping, one cannot observe the true runtime of runs that time out, precluding the typical use of SMBO. To marry adaptive capping and SMBO, we instead use SMBO to model the probability that a configuration will improve on the best runtime achieved so far, for which we propose several decomposed models. These models also allow defining prior probabilities for each hyperparameter. The experimental results show that our DeCaprio method speeds up hyperparameter search compared to random search and the seminal adaptive capping approach of ParamILS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.