2021
DOI: 10.1002/ail2.55
|View full text |Cite
|
Sign up to set email alerts
|

Generating visual explanations with natural language

Abstract: We generate natural language explanations for a fine-grained visual recognition task. Our explanations fulfill two criteria. First, explanations are class discriminative, meaning they mention attributes in an image which are important to identify a class. Second, explanations are image relevant, meaning they reflect the actual content of an image. Our system, composed of an explanation sampler and phrase-critic model, generates class discriminative and image relevant explanations. In addition, we demonstrate t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 36 publications
0
14
0
Order By: Relevance
“…Many interpretability methods developed to have causal-flavor are for local explanations, such as removing and adding pixels to generate counterfactual explanations for images [5,1] or for texts [7].…”
Section: Related Workmentioning
confidence: 99%
“…Many interpretability methods developed to have causal-flavor are for local explanations, such as removing and adding pixels to generate counterfactual explanations for images [5,1] or for texts [7].…”
Section: Related Workmentioning
confidence: 99%
“…In [18], a textual explanation model is proposed to investigate why the model predicted a class x instead of class X based on counterfactuals. They offered a "phrasecritic model" using explanation annotations and its counterparts.…”
Section: Bmentioning
confidence: 99%
“…Lack of alignment in the image-text pairs emphasizes that mere accuracy is not enough if it is not accompanied by a valid justification, i.e., to be right for the right reason [16]. Explainability efforts are also carried out by generating visual [17] and textual explanations [18], but they can result in multiple explanations with varied evaluation metrics. Modular approaches make the system interpretable by design [19], [20] but are majorly tested on synthetic datasets such as CLEVR [21].…”
Section: Introductionmentioning
confidence: 99%
“…The explanation provides only the most critical changes (i.e., adding wool and removing beard and horns) required to alter the model's prediction from Goat to Sheep, though several other changes may be necessary. While there are recent works on generating pixel-level counter-factual and contrastive explanations [41], [42], [43], to the best of our knowledge, this is the first work to propose a method for generating explanations that are iterative, counter-factual as well as conceptual.…”
Section: ) Explanation Is An Interactive Communication Processmentioning
confidence: 99%