Abstract. Inthispaper,we look at ways t om easure t he classification performance of a scoring system and the overall characteristics of a scorecard. Westickto the idea that we will measure the scoring system by howw ell it classifies, w hich are s till problems in measuring its performance. This is because there are different ways to define the misclassification rate mainly due to the sample that we use to check this rate. If we test how good the system is on the sample of customers we used to build the system, the results will be better than that we did the test on another sample. This idea is illustrated in this paper. Two measures, Mahalanobis distance and KS score, areused in the paper.Keywords: credit-scoring s ystems,measuring s corecard,c lassification, holdout, mahalanobisdistance, KS score.
1IntroductionHaving built a credit or behavioral scorecard, the obvious question is, "How good is it ?" this begs the question of whatw em eanbygood. The obvious answer isin distinguishing the good from the bad becausewewant to treat these groupsin different ways in credit-scoring systems---for example,accepting the former for credit and rejecting the latter. Behavioral scoringsystems are used in a more subtle way,but even if we stick to the idea that wewillm easure the scoring system byhow well it classifies, there are still problems in measuring its performance. This is because there are different ways tod efinethe misclassificationrate, mainlydue to the sample that we use to check this rate. Ifwe test how good the system is on the sample of customers weused to build the system, the results will bem ust better than if we did the test on another sample. This must followbecause built intothe classification system are someo fthe nuances of that data set that do not appear in other data sets. Thus section 2 looks at how to test the classification rate using a sample, called the holdout sample, separate from the one used to build the scoring system. This is a very common thing to do in the credit-scoringi ndustrybecause of the availability of very large samples of past customers, but it is wasteful of data in that one does not use all the information available to help build the best scoring system. There are times, however, whenthe amount of data is limited, for example, when one is building a system for a completelynewgroup of customers or products. In the case, one can test