2019 IEEE Visualization Conference (VIS) 2019
DOI: 10.1109/visual.2019.8933677
|View full text |Cite
|
Sign up to set email alerts
|

SANVis: Visual Analytics for Understanding Self-Attention Networks

Abstract: Figure 1: Overview of SANVis. (A) The network view displays multiple attention patterns for each layer according to three type of visualization options: (A-1) the attention piling option, (A-2) the Sankey diagram option, and (A-3) the small multiples option. (A-4) The bar chart shows the average attention weights for all heads (each colored with its corresponding hue) per each layer. (B) The HeadLens view helps the user analyze what the attention head learned by showing representative words and by providing st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
26
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(28 citation statements)
references
References 24 publications
(23 reference statements)
1
26
1
Order By: Relevance
“…While existing theoretical and empirical findings suggest the importance of diagonal elements in the self-attention matrix, we observe that they are indeed least important compared to the other entries. Furthermore, neighborhood tokens and special tokens (such as the first token [CLS] and last token [SEP]) are also prominent, which is consistent with previous observations in (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019;Clark et al, 2019). Besides, using the Gumbel-sigmoid function (Maddison et al, 2017), we propose the Differentiable Attention Mask (DAM) algorithm to learn the attention mask in an end-to-end manner.…”
Section: Introductionsupporting
confidence: 82%
See 2 more Smart Citations
“…While existing theoretical and empirical findings suggest the importance of diagonal elements in the self-attention matrix, we observe that they are indeed least important compared to the other entries. Furthermore, neighborhood tokens and special tokens (such as the first token [CLS] and last token [SEP]) are also prominent, which is consistent with previous observations in (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019;Clark et al, 2019). Besides, using the Gumbel-sigmoid function (Maddison et al, 2017), we propose the Differentiable Attention Mask (DAM) algorithm to learn the attention mask in an end-to-end manner.…”
Section: Introductionsupporting
confidence: 82%
“…Recently, its interpretation has aroused a lot of interest. Visualization has been commonly used to understand the attention map during inference (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019). For example, Park et al (2019) and Gong et al (2019) randomly select a sentence from the corpus and visualize the attention maps of different heads in a pre-trained transformer model.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Visual LM Explanations -Approaches for visual LM explanations can be grouped into two main categories. One strand of research focuses on transformer-based LMs and explains how they learn through visualizing attentions (e.g., NL-IZE (Liu et al, 2018), Seq2Seq-Vis (Strobelt et al, 2018), BertViz (Vig, 2019), exBERT (Hoover et al, 2020), SANVis (Park et al, 2019), andAttention Flows (DeRose et al, 2021)). Another strand of research explains what the model learns by visualizing word embeddings.…”
Section: Interpretability Of Language Modelsmentioning
confidence: 99%
“…For example, Lin et al [79] and Han et al [114] explored and visualized attentions by annotating on the sentences. Park et al [98] presented SANVis, a DVA system, to understand the attention mechanism of transformer in NLP scenarios.…”
Section: Natural Language Interfacesmentioning
confidence: 99%