2015
DOI: 10.32614/rj-2015-018
|View full text |Cite
|
Sign up to set email alerts
|

VSURF: An R Package for Variable Selection Using Random Forests

Abstract: This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
388
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 500 publications
(422 citation statements)
references
References 35 publications
(33 reference statements)
0
388
0
3
Order By: Relevance
“…In addition, we used the R package VSURF (73) (Variable Selection using Random Forests) for feature selection on the 37 phyla. This method uses Random Forests, which are an ensemble approach from machine learning that rank the importance of features in terms of their ability to classify a variable of interest, while taking into account the complex interrelationships of the features (74).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, we used the R package VSURF (73) (Variable Selection using Random Forests) for feature selection on the 37 phyla. This method uses Random Forests, which are an ensemble approach from machine learning that rank the importance of features in terms of their ability to classify a variable of interest, while taking into account the complex interrelationships of the features (74).…”
Section: Methodsmentioning
confidence: 99%
“…We used the R package VSURF (73) (Variable Selection using Random Forests) for feature selection on the 37 phyla. Random forest analysis identified three phyla (Actinobacteria, Lentisphaerae, and Verrucomicrobia) as highly important to classify PTSD versus TE controls.…”
Section: Random Forest/vsurfmentioning
confidence: 99%
“…This was done in order to obtain a reduced variable set for both interpretation and prediction issues. Therefore, we applied the variable elimination process described by Genuer et al (2015) on the same data as for RF all . In contrast to the former approach, we here included the geographic coordinates (x = easting, y = northing; UTM ETRS89) of the FC in order to account for the spatial distribution of the feeding plots.…”
Section: Methodsmentioning
confidence: 99%
“…Variables are thus kept or eliminated from the nested model according to a threshold of the minimum error gain relating to the “out-of-bag” error (OOB). The threshold was calculated by the mean of the first-order differentiated OOB errors (see Genuer et al, 2015). Using the ‘VSURF’ package (Genuer et al, 2015), we provided an additional variable selection suggested for prediction (RF V SURF ).…”
Section: Methodsmentioning
confidence: 99%
“…Random forests are used for variable selection using the 'VSURF' package (Genuer et al, 2016). The variable importance plots for each predictor variable and for both ranges of relative ammonia values are obtained using this function.…”
Section: Data Analysesmentioning
confidence: 99%