Grad-SAM

Facing the black-box nature of deep learning models for image classification, a popular trend in the literature proposes methods to generate explanations in the form of heat maps indicating the areas that played an important role in the models' decisions. Such explanations are called saliency maps and constitute an active field of research, given that many fundamental questions are yet to be answered: how to compute them efficiently? How to evaluate them? What exactly can they be used for? Given the increasing rate at which papers are produced and the vast amount of literature that is already existing, we propose our study to help newcomers become part of this community and to contribute to the research field. First, the two existing approaches to generate saliency maps are discussed, namely post-hoc methods and attention models. Post-hoc methods are generic algorithms that can be applied to any model from a given class without requiring fine-tuning. On the contrary, attention models are ad-hoc architectures that generate a saliency map during the inference phase to guide the decision. We show that both approaches can be divided into several subcategories and illustrate each of them with one important model or method. Second, we present the current methodologies used to evaluate saliency maps, including objective and subjective protocols, depending on whether or not they involve users. Among objective methods, we notably detail faithfulness metrics and propose an implementation featuring the faithfulness metrics discussed in this paper (https://github. com/TristanGomez44/metrics-saliency-maps).

show abstract

“…Some more recent works have also proposed versions of post-hoc algorithms tailored for the transformer model. 27,28…”

Section: Transformer Approachmentioning

confidence: 99%

Computing and evaluating saliency maps for image classification: a tutorial

Gomez

Mouchère

2023

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…However, this summary still only includes the attention layers and neglects all other network components [47]. In response, various improvements over attention rollout have been proposed, such as GradSAM [48] or an LRP-based explanation method [49], that were designed to more accurately reflect the computations of all model components.…”

Section: Explaining Attention-based Modelsmentioning

confidence: 99%

B-cos Networks: Alignment is All We Need for Interpretability

Böhle

Fritz

Schiele

2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. For this, we propose to replace the linear transformations in DNNs by our novel B-cos transformation. As we show, a sequence (network) of such transformations induces a single linear transformation that faithfully summarises the full model computations. Moreover, the B-cos transformation is designed such that the weights align with relevant signals during optimisation. As a result, those induced linear transformations become highly interpretable and highlight task-relevant features. Importantly, the B-cos transformation is designed to be compatible with existing architectures and we show that it can easily be integrated into virtually all of the latest state of the art models for computer vision-e.g. ResNets, DenseNets, ConvNext models, as well as Vision Transformers-by combining the B-cos-based explanations with normalisation and attention layers, all whilst maintaining similar accuracy on ImageNet. Finally, we show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.

show abstract

“…For instance, Chefer et al (2021) utilize the Taylor Decomposition principle to assign and propagate a local relevance score through the layers of a ViT model. Similarly, Sun et al (2021) and Barkan et al (2021) employ attention gradient weighting on ViT and BERT models, respectively. However, these approaches primarily focused on the attention weight of the "cls" token, and the latter two methods weighed each token's attention weight through element-wise multiplication.…”

Section: Layer Attention Map Generationmentioning

confidence: 99%

Weakly Supervised Intracranial Hemorrhage Segmentation Using Hierarchical Combination of Attention Maps from a Swin Transformer

Rasoulian

Salari

Xiao

2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Intracranial hemorrhage (ICH) is a life-threatening medical emergency caused by various factors. Timely and precise diagnosis of ICH is crucial for administering effective treatment and improving patient survival rates. While deep learning techniques have emerged as the leading approach for medical image analysis and processing, the most commonly employed supervised learning often requires large, high-quality annotated datasets that can be costly to obtain, particularly for pixel/voxel-wise image segmentation. To address this challenge and facilitate ICH treatment decisions, we proposed a novel weakly supervised ICH segmentation method that leverages a hierarchical combination of head-wise gradient-infused self-attention maps obtained from a Swin transformer. The transformer is trained using an ICH classification task with categorical labels. To build and validate the proposed technique, we used two publicly available clinical CT datasets, namely RSNA 2019 Brain CT hemorrhage and PhysioNet. Additionally, we conducted an exploratory study comparing two learning strategies -binary classification and full ICH subtyping -to assess their impact on self-attention and our weakly supervised ICH segmentation framework. The proposed algorithm was compared against the popular U-Net with full supervision, as well as a similar weakly supervised approach using Grad-CAM for ICH segmentation. With a mean Dice score of 0.47, our technique achieved similar ICH segmentation performance as the U-Net and outperformed the Grad-CAM based approach, demonstrating the excellent potential of the proposed framework in challenging medical image segmentation tasks.

show abstract

Grad-SAM

Cited by 13 publications

References 28 publications

Computing and evaluating saliency maps for image classification: a tutorial

Computing and evaluating saliency maps for image classification: a tutorial

B-cos Networks: Alignment is All We Need for Interpretability

Weakly Supervised Intracranial Hemorrhage Segmentation Using Hierarchical Combination of Attention Maps from a Swin Transformer

Contact Info

Product

Resources

About