2016
DOI: 10.1186/s12859-016-0995-8
|View full text |Cite
|
Sign up to set email alerts
|

Do little interactions get lost in dark random forests?

Abstract: BackgroundRandom forests have often been claimed to uncover interaction effects. However, if and how interaction effects can be differentiated from marginal effects remains unclear. In extensive simulation studies, we investigate whether random forest variable importance measures capture or detect gene-gene interactions. With capturing interactions, we define the ability to identify a variable that acts through an interaction with another one, while detection is the ability to identify an interaction effect as… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
87
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 127 publications
(93 citation statements)
references
References 39 publications
2
87
0
Order By: Relevance
“…Network size was actually the most important variable in the wild type cognitive impairment model. Given that high Gini index can sometimes reflect interaction effects [52], we explored, post hoc, the joint variable importance [64] of all pairs of features in the wild type cognitive impairment final model. This revealed that larger tumor volume tended to be associated with smaller network size in patients with wild type tumor (r = −0.30, p = 0.09).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Network size was actually the most important variable in the wild type cognitive impairment model. Given that high Gini index can sometimes reflect interaction effects [52], we explored, post hoc, the joint variable importance [64] of all pairs of features in the wild type cognitive impairment final model. This revealed that larger tumor volume tended to be associated with smaller network size in patients with wild type tumor (r = −0.30, p = 0.09).…”
Section: Discussionmentioning
confidence: 99%
“…education level) with neural plasticity and repair [50, 51], we also examined a global efficiency*education interaction term. Variable importance was evaluated using mean decrease in Gini index [52]. Because classes (impaired vs. not impaired) were imbalanced in both groups, random minority over-sampling was employed [53, 54].…”
Section: Methodsmentioning
confidence: 99%
“…While our analyses did include epigenomic data, CpGs were not simulated to have main effects, as they only had interaction effects through corresponding SNPs. Interactions are particularly difficult to assess in RF when the interacting features lack main effects as they are unlikely to be selected and split on at all [16], which suggests that they are unlikely to rank highly in RF, and could explain why RF-RFE did not improve their rankings. Thus, we were unable to explicitly assess the performance of RF-RFE with this nongenetics omics data set.…”
Section: Discussionmentioning
confidence: 99%
“…The other three causal CpG sites ranked very poorly in RF and did not suggest interactions. However, this was not unexpected as permutation importance scores were not designed to detect interactions [16] and reportedly fail to do so in high-dimensional data with weak marginal effects [17]. Although it has been shown that RF is influenced by interactions, it is very difficult to specifically identify which variables are interacting with current variable importance methods [16].…”
Section: Discussionmentioning
confidence: 99%
“…Both random forest and Logic Forest provide quantitative importance measures for individual predictors allowing them to be ranked according to their relative importance in determining an outcome [17,22]. However, predictor importance for each variable represents the marginal effect of a predictor and if a set of predictors is associated with the outcome only through interactions effects, these marginal importance measures may mask such interaction effects [23]. Unlike random forest, Logic Forest also provides a quantitative measure of importance for interactions identified by the forest, which is advantageous in complex disease settings where interactions among genetic and environmental factors rather than main effects lead to disease.…”
Section: Introductionmentioning
confidence: 99%