2015
DOI: 10.18267/j.aop.472
|View full text |Cite
|
Sign up to set email alerts
|

Data representativeness problem in credit scoring

Abstract: When building models, it is common to split the whole dataset into a development and a validation sample. In some cases, using random sampling instead of stratified sampling can lead to loss of representativeness of final samples. In such cases, a model built on these data gives different or unexpected results when its performance is measured on the validation sample. In the business area, a lack of representativeness can cause interpretative problems and can have a huge financial impact when a biased model is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…In random sampling, each data instance has the same probability of being chosen, and this can reduce the bias of model performance. However, it produces a high variance in model performance, if a dataset has an abnormal distribution or the size of the sample is small [3,4]. Systematic sampling is a method of extracting data by randomly arranging data and skipping at regular intervals [5].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In random sampling, each data instance has the same probability of being chosen, and this can reduce the bias of model performance. However, it produces a high variance in model performance, if a dataset has an abnormal distribution or the size of the sample is small [3,4]. Systematic sampling is a method of extracting data by randomly arranging data and skipping at regular intervals [5].…”
Section: Introductionmentioning
confidence: 99%
“…Stratified sampling is a method of first dividing a population into layers, so that they do not overlap, and then sampling from each layer. It uses the internal structure (layers) and the distribution of a dataset [4]. D-optimal [6] and the most descriptive compound method (MDC) [7] are advanced stratified sampling methods.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation