2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00211
|View full text |Cite
|
Sign up to set email alerts
|

Fooling Network Interpretation in Image Classification

Abstract: Deep neural networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches are highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the predic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
49
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(51 citation statements)
references
References 26 publications
(40 reference statements)
2
49
0
Order By: Relevance
“…Due to the lack of ground truth, we do not know which pixel is in fact important to a model. Existing evaluation methods can be classified into three categories, namely, removing pixel features [17,18,39], setting relative ground truth [20,26,40] and user-oriented measurement [41,42].…”
Section: Evaluation Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Due to the lack of ground truth, we do not know which pixel is in fact important to a model. Existing evaluation methods can be classified into three categories, namely, removing pixel features [17,18,39], setting relative ground truth [20,26,40] and user-oriented measurement [41,42].…”
Section: Evaluation Methodsmentioning
confidence: 99%
“…Another different method is to set ground truth from different perspectives. Akshayvarun et al [20] introduces adversarial patches as a true cause of prediction and shows that Grad-cam interpretation method is unreliability and easily be fooled by the adversarial example. Mengjiao et al [26] constructs carefully semi-natural dataset by pasting object pixels into scene image and trains models with these dataset.…”
Section: Setting Relative Ground Truthmentioning
confidence: 99%
See 1 more Smart Citation
“…Pruthi et al (2020) manipulate attention distributions in an end-to-end fashion; we focus on manipulating gradients. It is worth noting that we perturb models to manipulate interpretations; other work perturbs inputs (Ghorbani et al, 2019;Dombrowski et al, 2019;Subramanya et al, 2019). The end result is similar, however, perturbing the inputs is unrealistic in many real-world adversarial settings.…”
Section: Related Workmentioning
confidence: 99%
“…However, recent studies proposed several attack methods showing that some XAI models have also been easily attacked. Some examples are the input gradient [14], meaningful perturbation [15], fooling network interpretation [16], adversarial model manipulation [17], deceiving the local interpretable modelagnostic explanations (LIME), and Shapley additive explanations (SHAPs) [18].…”
Section: Introductionmentioning
confidence: 99%