2022
DOI: 10.1016/j.inffus.2021.11.008
|View full text |Cite
|
Sign up to set email alerts
|

CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
51
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 80 publications
(61 citation statements)
references
References 41 publications
0
51
0
Order By: Relevance
“…In this example, we assume to have a pre-trained model (model), a batch of inputand output pairs (x_batch, y_batch) and a set of attributions (a_batch). Needless to say, XAI evaluation is intrinsically difficult and there is no one-size-fits-all metric for all tasks -evaluation of explanations must be understood and calibrated from its context: the application, data, model, and intended stakeholders [10,34]. To this end, we designed Quantus to be highly customisable and easily extendable -documentation and examples on how to create new metrics as well as how to customise existing ones are included.…”
Section: Library Designmentioning
confidence: 99%
See 1 more Smart Citation
“…In this example, we assume to have a pre-trained model (model), a batch of inputand output pairs (x_batch, y_batch) and a set of attributions (a_batch). Needless to say, XAI evaluation is intrinsically difficult and there is no one-size-fits-all metric for all tasks -evaluation of explanations must be understood and calibrated from its context: the application, data, model, and intended stakeholders [10,34]. To this end, we designed Quantus to be highly customisable and easily extendable -documentation and examples on how to create new metrics as well as how to customise existing ones are included.…”
Section: Library Designmentioning
confidence: 99%
“…Despite much excitement and activity in the field of eXplainable Artificial Intelligence (XAI) [1,2,3,4,5], the evaluation of explainable methods still remains an unsolved problem [6,7,8,9,10]. Unlike in traditional Machine Learning (ML), the task of explaining inherently lacks "ground-truth" datathere is no universally accepted definition of what constitutes a "correct" explanation and less so, which properties an explanation ought to fulfill [11].…”
Section: Introductionmentioning
confidence: 99%
“…A limitation that also the synthetic dataset in shares. The synthetic dataset created by Arras et al (2021) created a dataset to technically evaluate saliency methods on a visual question answering task technically. TWO4TWO is the first dataset designed explicitly for human subject evaluations.…”
Section: Related Workmentioning
confidence: 99%
“…It can be used for testing the quality of explanations and concept learning. Additionally, [6] proposed the CLEVR-XAI-simple and CLEVR-XAI-complex datasets which provide ground-truth segmentation information for heatmap-based visual explanations. Our CLEVR-X augments the existing CLEVR dataset with explanations, but in contrast to (heatmap-based) visual explanations, we focus on natural language explanations.…”
Section: Related Workmentioning
confidence: 99%
“…selected; (3) A CAPTCHA [3] to verify that the user is human; (4) The problem definition consisting of a question and an image; (5) A user qualification step, for which the user has to correctly answer a question about an image. This ensures that the user is able to answer the question in the first place, a necessary condition to participate in our user study; (6) Two explanations from which the user needs to choose one. Example screenshots of the user interface for the user study are shown in Fig.…”
Section: User Study On Explanation Completeness and Relevancementioning
confidence: 99%