2018
DOI: 10.48550/arxiv.1805.04755
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Simple and Effective Model-Based Variable Importance Measure

Abstract: In the era of "big data", it is becoming more of a challenge to not only build state-of-the-art predictive models, but also gain an understanding of what's really going on in the data. For example, it is often of interest to know which, if any, of the predictors in a fitted model are relatively influential on the predicted outcome. Some modern algorithms-like random forests and gradient boosted decision trees-have a natural way of quantifying the importance or relative influence of each feature. Other algorith… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
73
0
4

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 54 publications
(79 citation statements)
references
References 16 publications
0
73
0
4
Order By: Relevance
“…This method is based on partial dependency of input features. Essentially, it can be derived from PDPs that input features that have more variability in their PDP, are more influential in the final predictions made by the ML model (Greenwell, 2018). Consequently, the features for which the PDP is flat is likely to be less important than input variables with more variable PDP across range of their values.…”
Section: Feature Importancementioning
confidence: 99%
“…This method is based on partial dependency of input features. Essentially, it can be derived from PDPs that input features that have more variability in their PDP, are more influential in the final predictions made by the ML model (Greenwell, 2018). Consequently, the features for which the PDP is flat is likely to be less important than input variables with more variable PDP across range of their values.…”
Section: Feature Importancementioning
confidence: 99%
“…A bivariate importance measure, perhaps obtained by permuting pairs of variables, could be used in place of the H-statistic in the heatmap and network visualizations. It would also be interesting to explore the interaction measures of Hooker (2004) and Greenwell et al (2018) in our visualizations, and whether these measures avoid the issues identified with the use of H.…”
Section: Discussionmentioning
confidence: 99%
“…It is also useful to summarize the main and interaction ALEs with a one-number summary that can be used to rank the importance of each effect. Following Greenwell et al (2018), we propose to measure overall variable importance (VI) for continuous covariates using the standard deviation of the ALE with respect to the marginal distribution of X, i.e.,…”
Section: I-spline Basis Expansionmentioning
confidence: 99%