2010
DOI: 10.1186/1471-2105-11-447
|View full text |Cite
|
Sign up to set email alerts
|

Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms

Abstract: BackgroundData generated using 'omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
65
0

Year Published

2011
2011
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 76 publications
(66 citation statements)
references
References 28 publications
0
65
0
Order By: Relevance
“…Measuring statistically the minimum required amount of information in a given problem to guarantee a correct classification is a hard task. Related studies confirmed that the power of classifiers suffered a dramatic reduction in the event of both imbalance and lack of data [68].…”
Section: The Lack Of Data For the Map Stage In Imbalanced Classificatmentioning
confidence: 81%
“…Measuring statistically the minimum required amount of information in a given problem to guarantee a correct classification is a hard task. Related studies confirmed that the power of classifiers suffered a dramatic reduction in the event of both imbalance and lack of data [68].…”
Section: The Lack Of Data For the Map Stage In Imbalanced Classificatmentioning
confidence: 81%
“…While machine learning can exploit existing data to help inform several aspects of the study, an unresolved question is how to determine the sample size for these multivariate methods. Several methods, primarily for genetic data, have been proposed in the literature (Figueroa et al 2012;Guo et al 2010); however, a consensus has not yet been reached. Further work is required to assess these methods for machine learning in neuroimaging data.…”
Section: Discussionmentioning
confidence: 99%
“…Recent comparative studies of different classification algorithms [38,39] provide important insights into their behavior under the ‘dimensionality curse’ conditions; more empirical/ simulation studies of this kind are needed. Similarly, Genetic Analysis Workshop 16 (GAW16) ([40] and references therein) proved very fruitful in comparing various variable selection and classification approaches as applied to the analysis of real standardized GWAS datasets.…”
Section: Machine Learning Methodsmentioning
confidence: 99%