2010
DOI: 10.1186/1471-2105-11-523
|View full text |Cite
|
Sign up to set email alerts
|

Class prediction for high-dimensional class-imbalanced data

Abstract: BackgroundThe goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
124
0
1

Year Published

2012
2012
2021
2021

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 197 publications
(130 citation statements)
references
References 40 publications
(45 reference statements)
1
124
0
1
Order By: Relevance
“…Class imbalance occurs frequently in QSAR and drug discovery datasets 14,[65][66][67] . This could be for a number of reasons; however in this context it is due to lack of publically available data for the minority class, poorly-moderately absorbed compounds, in the literature.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Class imbalance occurs frequently in QSAR and drug discovery datasets 14,[65][66][67] . This could be for a number of reasons; however in this context it is due to lack of publically available data for the minority class, poorly-moderately absorbed compounds, in the literature.…”
Section: Resultsmentioning
confidence: 99%
“…Another problem with under-sampling is that in order to assess the predictability of the balanced training set fairly, the validation set will also have to be adjusted to mirror the training set in terms of distribution of the data, but again this reduces the dataset size in the validation set and increases the variability of the results 14 . However the models built using this equal distribution should be better models to predict both poorly and highly-absorbed compounds if a big enough dataset is used.…”
Section: Introductionmentioning
confidence: 99%
“…However, considering it with class-imbalance presents an additional source of difficulties for prediction, as it biases classification towards majority class for most classifiers (see, e.g. experimental analyses from Blagus and Lusa (2010)). The attribute (feature) selection is often applied in standard balanced classification to enhance predictive performance.…”
Section: Feature Ensembles and Class Imbalancementioning
confidence: 99%
“…Imbalanced datasets might lead to overfitting of the training algorithms to the most common class and many mistakes in the least common class, leading to a poor generalisation performance (Huang et al 2006;Blagus and Lusa 2010). A common solution to the overfitting problem in imbalanced datasets is using cost-sensitive learning ANN.…”
Section: Third Step: Optimisation Of the Cost-sensitive Learning Paramentioning
confidence: 99%