Biocomputing 2015 2014
DOI: 10.1142/9789814644730_0020
|View full text |Cite
|
Sign up to set email alerts
|

Variable Selection Method for the Identification of Epistatic Models

Abstract: Standard analysis methods for genome wide association studies (GWAS) are not robust to complex disease models, such as interactions between variables with small main effects. These types of effects likely contribute to the heritability of complex human traits. Machine learning methods that are capable of identifying interactions, such as Random Forests (RF), are an alternative analysis approach. One caveat to RF is that there is no standardized method of selecting variables so that false positives are reduced … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
23
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 16 publications
(24 citation statements)
references
References 14 publications
1
23
0
Order By: Relevance
“…There is no gold standard method for determining the threshold that best differentiates signal from noise (Holzinger et al ., 2015). Expert consensus (Strobl et al ., 2009) suggests that it is best not to interpret or compare importance scores but to rely on the relative rankings of the predictors.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…There is no gold standard method for determining the threshold that best differentiates signal from noise (Holzinger et al ., 2015). Expert consensus (Strobl et al ., 2009) suggests that it is best not to interpret or compare importance scores but to rely on the relative rankings of the predictors.…”
Section: Methodsmentioning
confidence: 99%
“…As such, our approach provides an interpretive framework by identifying the set of predictors whose importance scores were consistently (i.e. in 100% of 5000 runs) above a standard threshold (Strobl et al ., 2009; Holzinger et al ., 2015) used for filtering out noise, thereby identifying the predictors that most consistently influence the outcome under study. As noted in Holzinger et al .…”
Section: Methodsmentioning
confidence: 99%
“…Consequently, it was shown that repeating the machine learning analysis several times with different random number seeds is more reliable than a single run [10, 11]. Specifically, running a machine learning algorithm multiple times with different seeds generates a distribution of VIMr values across runs.…”
Section: Methodsmentioning
confidence: 99%
“…For tree-based machine learning methods such as RF, overfitting generally occurs if the trees are allowed to continue splitting to purity [10, 11]. In other words, if the trees are allowed to become very complex, they are likely to “overreact” to noise in the data.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation