2019
DOI: 10.3934/fods.2019016
|View full text |Cite
|
Sign up to set email alerts
|

Issues using logistic regression with class imbalance, with a case study from credit risk modelling

Abstract: The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [19] has shown that, in a theoretical context related to infinite imbalance, logistic regression behaves in such a way that all data in the rare class can be replaced by their mean vector to achieve the same coefficient estimates. We build on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 22 publications
1
6
0
Order By: Relevance
“…Our goal is to predict this outcome based on loan and borrower features available at origination. This setup is consistent with Li et al [14], although our sample is much larger.…”
Section: High-sensitivity and High-specificity Regionssupporting
confidence: 93%
See 2 more Smart Citations
“…Our goal is to predict this outcome based on loan and borrower features available at origination. This setup is consistent with Li et al [14], although our sample is much larger.…”
Section: High-sensitivity and High-specificity Regionssupporting
confidence: 93%
“…Theorem 4.4 (Exactly Exponential w). Suppose w has the form in (14). Suppose Conditions 3-5 hold, with k = 0 in (11).…”
Section: Convergence Under Infinite Imbalancementioning
confidence: 99%
See 1 more Smart Citation
“…It is noteworthy that most of the reported studies are based on a combination of feature selection and relatively simple models such as Multivariate Logistic Regression (MLR) or Diagonalized Linear Discriminant Analysis (DLDA). However, because there is no clear correlation between the expression of a gene and a NAC response vector (mean PCC ), linear models may not be the appropriate choice for classifying the NAC response [ 28 ]. In addition, some studies have utilized prior marker information to restrict the search space of optimal markersets, but these studies only utilized the associated genes of predefined mechanisms such as the immune response [ 29 ] and the E2F pathway [ 19 , 20 ], which may not fully represent the mechanisms of the NAC response.…”
Section: Introductionmentioning
confidence: 99%
“…It is precisely this feature that makes the DC regression more efficient with imbalanced classifiers. A recent study on imbalanced classifiers isLi et al (2019) Albert and Anderson (1984),Allison (2008),. and Ghosh et al (2018) study convergence issues.…”
mentioning
confidence: 99%