2008
DOI: 10.1016/j.patcog.2007.11.008
|View full text |Cite
|
Sign up to set email alerts
|

Do unbalanced data have a negative effect on LDA?

Abstract: For two-class discrimination, Ref.~\cite{Xie:2007} claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced dataset had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing $10$ real-world datasets, Ref.~\cite{Xie:2007} provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no sol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

5
45
1

Year Published

2011
2011
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(51 citation statements)
references
References 12 publications
5
45
1
Order By: Relevance
“…However, a study by Xue and Titterington [21] revealed that there is no reliable empirical evidence to support the claim that an unbalanced data set negatively impacts the performance of the LDA/BTM approaches. Further, a recent study by López et al [22] shows that the unbalanced ratio by itself does not have the most significant effect on the classifiers performance, but there are other issues such as (a) the presence of small disjuncts, (b) the lack of density, (c) the class overlapping, (d) the noisy data, (e) the management of borderline examples, and (f) the dataset shift that must be taken into account.…”
Section: Limitations and Threats To Validitymentioning
confidence: 99%
“…However, a study by Xue and Titterington [21] revealed that there is no reliable empirical evidence to support the claim that an unbalanced data set negatively impacts the performance of the LDA/BTM approaches. Further, a recent study by López et al [22] shows that the unbalanced ratio by itself does not have the most significant effect on the classifiers performance, but there are other issues such as (a) the presence of small disjuncts, (b) the lack of density, (c) the class overlapping, (d) the noisy data, (e) the management of borderline examples, and (f) the dataset shift that must be taken into account.…”
Section: Limitations and Threats To Validitymentioning
confidence: 99%
“…1 shows a motivating example, using a scatter plot and a panel of nine boxplots of AUC to illustrate visually the fact that rebalancing the training data can often improve the performance of LDA in terms of AUC [5], [6]. This example is extracted from an experiment on simulated data arising from two four-dimensional, Gaussian-distributed classes C 0 and C 1 .…”
Section: Notationmentioning
confidence: 99%
“…This example is extracted from an experiment on simulated data arising from two four-dimensional, Gaussian-distributed classes C 0 and C 1 . With a slightly different setting, the experiment explores more rebalancing scenarios than in [6]. It includes the following four steps.…”
Section: Notationmentioning
confidence: 99%
See 1 more Smart Citation
“…Even though the LDA has been extensively studied [7][8][9], the effect of unbalanced training datasets using electroencephalographic (EEG) data and the number of patterns necessary to reach a performance plateau have not been tested. That is, the point at which no significant performance gain will exist when adding more training patterns has not been determined.…”
Section: Introductionmentioning
confidence: 99%