2017
DOI: 10.1109/lgrs.2017.2745049
|View full text |Cite
|
Sign up to set email alerts
|

A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values

Abstract: Random Forests variable importance measures are often used to rank variables by their relevance to a classification problem and subsequently reduce the number of model inputs in high-dimensional data sets, thus increasing computational efficiency. However, as a result of the way that training data and predictor variables are randomly selected for use in constructing each tree and splitting each node, it is also well known that if too few trees are generated, variable importance rankings tend to differ between … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
44
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 97 publications
(56 citation statements)
references
References 20 publications
0
44
0
Order By: Relevance
“…In the majority of studies, MDA is used [47] because it is considered as more straightforward, reliable, and easier to understand [51]. However, Behnamian et al [52] claimed that MDG is slightly more stable. Thus, we decided to use both methodologies.…”
Section: Variable Importance and Assessment Of Temporal Patternsmentioning
confidence: 99%
“…In the majority of studies, MDA is used [47] because it is considered as more straightforward, reliable, and easier to understand [51]. However, Behnamian et al [52] claimed that MDG is slightly more stable. Thus, we decided to use both methodologies.…”
Section: Variable Importance and Assessment Of Temporal Patternsmentioning
confidence: 99%
“…Then, an increasing number of variables were removed in steps (i.e., r > 0.9, r > 0.8, r > 0.7, r > 0.6, and r > 0.5) assuming that a decrease in accuracy would occur if a given variable, or set of variables, provided valuable information (and thus should be retained). Note that we assumed that the Mean Decrease in Accuracy correctly identified the most important input, among sets of correlated variables [64], thus this value (averaged across 10 model runs to achieve stable variable importance measures [65]) was used to identify which variable to retain, while all others were removed. After having created a set of uncorrelated variables, the 10 with the highest Mean Decrease in Accuracy ranking were used as inputs to a model.…”
Section: Applying the Random Forests Algorithmmentioning
confidence: 99%
“…These were then used as inputs to a model. Similar to (i) and (iii), 10 model runs were used to achieve a stable variable importance ranking [65]. (iii) Ten remaining variables following backward selection process.…”
Section: Applying the Random Forests Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…Fourth, RF is not easy to fall into overfitting compared with the Boosting method [18]. Moreover, RF measures the importance of variables automatically [25]. Finally, RF can obtain a higher classification accuracy compared to other well-known classifiers such as SVM [5,6] and maximum likelihood (ML) [17,23], with fewer parameters.Active learning is a kind of iterative method that queries the most informative samples for manual labeling at each iteration [14].…”
mentioning
confidence: 99%