2021
DOI: 10.1007/s42452-021-04148-9
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of feature importance measures as explanations for classification models

Abstract: Explainable artificial intelligence is an emerging research direction helping the user or developer of machine learning models understand why models behave the way they do. The most popular explanation technique is feature importance. However, there are several different approaches how feature importances are being measured, most notably global and local. In this study we compare different feature importance measures using both linear (logistic regression with L1 penalization) and non-linear (random forest) me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
90
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 199 publications
(125 citation statements)
references
References 38 publications
(49 reference statements)
0
90
0
1
Order By: Relevance
“…By now, however, “Explainable AI” is a dedicated branch in ML research, and numerous model-specific and model-agnostic methods are available that can partially explain ML prediction outcomes ( 32 ). Two common ways to explain model performance is to analyze the distribution of input samples ( 4 , 33 ), and to analyze feature importance ( 34 ), especially in a clinical setting ( 35 ).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…By now, however, “Explainable AI” is a dedicated branch in ML research, and numerous model-specific and model-agnostic methods are available that can partially explain ML prediction outcomes ( 32 ). Two common ways to explain model performance is to analyze the distribution of input samples ( 4 , 33 ), and to analyze feature importance ( 34 ), especially in a clinical setting ( 35 ).…”
Section: Methodsmentioning
confidence: 99%
“…Obviously, not every classification algorithm yields the same ranking for feature importances. It is argued that a combination of several feature importance rankings can provide more reliable and trustworthy ( 34 ). Therefore, for our report to the expert, we aim at presenting a single table with the top 10 most important features for the given classification problem.…”
Section: Methodsmentioning
confidence: 99%
“…The variable importance [or feature importance; (Saarela and Jauhiainen, 2021)] represents the relative influence of each input variable in the model predictions and can be estimated using both RF and XGB models. Because the previous results indicate the best performance is obtained using the combined flow data set (with flow regime as a categorical input variable), we focus here on the base configuration (C0) with all flows combined.…”
Section: Variable Importance-base Configuration With Combined Flowsmentioning
confidence: 99%
“…Here, those key variables identified as most important differ depending on the ML technique used, and while they represent physical attributes or processes that are well known to influence HEFs and TTDs, it is not yet clear to what degree these measures are indicative of actual physical controls. While understanding feature importance remains an active research area in explainable AI, it has been suggested that combining multiple ML methods is necessary to increase interpretability of the predictions (Saarela and Jauhiainen, 2021). In this work, the importance of several input variables was identified by both ML models and is consistent with understanding from previous studies, including river bathymetry features (Cardenas and Wilson, 2007;Tonina and Buffington, 2007;Stonedahl et al, 2013), flow regime (Sawyer FIGURE 5 | ML model prediction R 2 on testing dataset for single-ML and flow regime-specific ML models, the top panel is RF model, the bottom panel is based on XGB model.…”
Section: Variable Importance-base Configuration With Combined Flowsmentioning
confidence: 99%
“…This is what was done in Figures 4 and 5. For other measures of feature importance, see Saarela and Jauhiainen, 2021. For the Hedonic feature set (as defined in Section 5.1), we obtained the rankings described in Figures 4 and 5, for the Random Forests and the Gradient Boosting regressors, respectively.…”
Section: Feature Selectionmentioning
confidence: 99%