2014
DOI: 10.1038/nmeth.3045
|View full text |Cite
|
Sign up to set email alerts
|

Predictor performance with stratified data and imbalanced classes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…We then conducted a receiver operating characteristics (ROC) curve analysis for % dfi and %ASA to elucidate their ability to distinguish between disease and neutral phenotypes of nsSNVs. A randomly generated test set consisting of 10% of the entire data set (which only includes nsSNVs at interfaces) was used and the remaining 90% was used for training . The area under the curve (AUC) for dfi is 0.71 and 0.56 for ASA [Fig.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We then conducted a receiver operating characteristics (ROC) curve analysis for % dfi and %ASA to elucidate their ability to distinguish between disease and neutral phenotypes of nsSNVs. A randomly generated test set consisting of 10% of the entire data set (which only includes nsSNVs at interfaces) was used and the remaining 90% was used for training . The area under the curve (AUC) for dfi is 0.71 and 0.56 for ASA [Fig.…”
Section: Resultsmentioning
confidence: 99%
“…A randomly generated test set consisting of 10% of the entire data set (which only includes nsSNVs at interfaces) was used and the remaining 90% was used for training. 4,49 The area under the curve (AUC) for dfi is 0.71 and 0.56 for ASA [ Fig. 5(B)].…”
Section: Figurementioning
confidence: 99%
“…The train-test split is obtained by randomly shuffling the data and stratifying the two classes in both training and test portions. Given the significantly lower number of accident events relative to non-accident events, stratification by class percentage is used to maintain a similar class ratio in both training and test portions while shuffling the dataset during split (Sechidis et al , 2011; Stone, 2014). In doing so, the same sequence of random numbers is used to ensure similar splitting of the dataset in every analysis run.…”
Section: Data Processing and Preparationmentioning
confidence: 99%