Bankruptcy prediction has been a popular and challenging research area for decades. Most prediction models are built using traditional data such as nancial gures, stock market data and rm specic variables. We complement such * Julie and Marija contributed equally.ble model is built that combines the relational model's output scores with the structured data. We nd that this ensemble model outperforms the base model when detecting the riskiest rms, especially when predicting two-years ahead.
Based on two datasets containing Loss Given Default (LGD) observations of home equity and corporate loans, we consider non-linear and non-parametric techniques to model and forecast LGD. These techniques include non-linear Support Vector Regression (SVR), a regression tree and a two-stage model combining a linear regression with SVR. We compare these models with an ordinary least squares linear regression. In addition, we incorporate several macroeconomic variables to estimate the influence of the economic state on loan losses. We investigate whether a Box-Cox transformation of the macroeconomic features improves the linear regression model. Due to the instable distributions, both out-of-time and out-of-sample setups are considered. The two-stage model outperforms the other techniques when forecasting out-of-time, while the non-parametric regression tree is the best performer when forecasting out-of-sample. The complete non-linear SVR reports poor prediction results, both in comprehensibility and accuracy. The incorporation of macroeconomic variables significantly improves the prediction performance of most of the models. These conclusions can help financial institutions when estimating LGD under the Internal Ratings Based Approach of the Basel Accords in order to estimate the downturn LGD needed to calculate the capital requirements.
Summary
Banks are continuously looking for novel ways to leverage their existing data assets. A major source of data that has not yet been used to the full extent is massive fine‐grained payment data on the bank's customers. In the paper, a design is proposed that builds predictive credit scoring models by using the fine‐grained payment data. Using a real life data set of 183 million transactions made by 2.6 million customers, we show that the scalable implementation that is put forward leads to a significant improvement in the receiver operating characteristic area under the curve, with only seconds of computation needed. When investigating the 1% riskiest customers, twice as many defaulters are detected when using the payment data. Such an improvement has a big effect on the overall working of the bank, from applicant scoring to minimum capital requirements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.