2023
DOI: 10.1145/3583558
|View full text |Cite
|
Sign up to set email alerts
|

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
36
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 126 publications
(69 citation statements)
references
References 191 publications
0
36
0
Order By: Relevance
“…Recent work in the NLG community has aimed to provide an overview of our evaluation practices, and move towards standardising our terminology and assessment materials (Belz et al, 2020;Howcroft et al, 2020). There have been similar efforts in the areas of Explainable AI (Nauta et al, 2022) and Intelligent Virtual Agents (Fitrianie et al, 2019(Fitrianie et al, , 2020. The majority of our respondents indicated that they would be more likely to carry out an error analysis if there were an existing taxonomy of errors that they could use.…”
Section: Error Taxonomies and Standardizationmentioning
confidence: 62%
“…Recent work in the NLG community has aimed to provide an overview of our evaluation practices, and move towards standardising our terminology and assessment materials (Belz et al, 2020;Howcroft et al, 2020). There have been similar efforts in the areas of Explainable AI (Nauta et al, 2022) and Intelligent Virtual Agents (Fitrianie et al, 2019(Fitrianie et al, , 2020. The majority of our respondents indicated that they would be more likely to carry out an error analysis if there were an existing taxonomy of errors that they could use.…”
Section: Error Taxonomies and Standardizationmentioning
confidence: 62%
“…By definition, algorithm-generated explanation of ML models have to be understandable by humans. This concept, tightly coupled with the amount of information included in an explanation, has been referred to as comprehensibility [14], conciseness [3], and compactness [22] -amongst others. Local linear explanations provide an insight on how a black-box classifier weights its input features to achieve the classification outcome of a given sample.…”
Section: Discussionmentioning
confidence: 99%
“…We test the resulting explanations by measuring the classifier’s prediction with only (Fidelity) and without the important words (Deletion Check) [ 46 ]. If relevant words for the classifier were selected by the explanation method, one would expect that the accuracy stayed similar to the original model with only the important words and is significantly reduced if the important words are removed.…”
Section: Methodsmentioning
confidence: 99%