2020
DOI: 10.48550/arxiv.2012.10425
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Robust Explanations for Deep Neural Networks

Abstract: Explanation methods shed light on the decision process of black-box classifiers such as deep neural networks. But their usefulness can be compromised because they are susceptible to manipulations. With this work, we aim to enhance the resilience of explanations. We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model. Based on these theoretical insights, we present three different techniques to boost robustness against manipulation: training with weight decay, sm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 35 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?