SANVis: Visual Analytics for Understanding Self-Attention Networks

Park, Cheonbok; Choo, Jaegul; Na, Inyoup; Jo, Yongjang; Shin, Sungbok; Yoo, Jaehyo; Kwon, Bum Chul; Zhao, Jian; Noh, Hyungjong; Lee, Yeonsoo

doi:10.1109/visual.2019.8933677

Cited by 22 publications

(28 citation statements)

References 24 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While existing theoretical and empirical findings suggest the importance of diagonal elements in the self-attention matrix, we observe that they are indeed least important compared to the other entries. Furthermore, neighborhood tokens and special tokens (such as the first token [CLS] and last token [SEP]) are also prominent, which is consistent with previous observations in (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019;Clark et al, 2019). Besides, using the Gumbel-sigmoid function (Maddison et al, 2017), we propose the Differentiable Attention Mask (DAM) algorithm to learn the attention mask in an end-to-end manner.…”

Section: Introductionsupporting

confidence: 82%

“…Recently, its interpretation has aroused a lot of interest. Visualization has been commonly used to understand the attention map during inference (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019). For example, Park et al (2019) and Gong et al (2019) randomly select a sentence from the corpus and visualize the attention maps of different heads in a pre-trained transformer model.…”

Section: Introductionmentioning

confidence: 99%

“…Visualization has been commonly used to understand the attention map during inference (Park et al, 2019;Gong et al, 2019;Kovaleva et al, 2019). For example, Park et al (2019) and Gong et al (2019) randomly select a sentence from the corpus and visualize the attention maps of different heads in a pre-trained transformer model. Kovaleva et al (2019) summarizes five attention patterns and estimates their ratios in different tasks.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

SparseBERT: Rethinking the Importance Analysis in Self-attention

Shi,

Gao,

Ren

et al. 2021

Preprint

View full text Add to dashboard Cite

Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity. As the core component, selfattention module has aroused widespread interests. Attention map visualization of a pre-trained model is one direct method for understanding selfattention mechanism and some common patterns are observed in visualization. Based on these patterns, a series of efficient transformers are proposed with corresponding sparse attention masks. Besides above empirical results, universal approximability of Transformer-based models is also discovered from a theoretical perspective. However, above understanding and analysis of self-attention is based on a pre-trained model. To rethink the importance analysis in self-attention, we delve into dynamics of attention matrix importance during pre-training. One of surprising results is that the diagonal elements in the attention map are the most unimportant compared with other attention positions and we also provide a proof to show these elements can be removed without damaging the model performance. Furthermore, we propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design further. The extensive experiments verify our interesting findings and illustrate the effect of our proposed algorithm.

show abstract

Section: Introductionsupporting

confidence: 82%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SparseBERT: Rethinking the Importance Analysis in Self-attention

Shi,

Gao,

Ren

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Visual LM Explanations -Approaches for visual LM explanations can be grouped into two main categories. One strand of research focuses on transformer-based LMs and explains how they learn through visualizing attentions (e.g., NL-IZE (Liu et al, 2018), Seq2Seq-Vis (Strobelt et al, 2018), BertViz (Vig, 2019), exBERT (Hoover et al, 2020), SANVis (Park et al, 2019), andAttention Flows (DeRose et al, 2021)). Another strand of research explains what the model learns by visualizing word embeddings.…”

Section: Interpretability Of Language Modelsmentioning

confidence: 99%

Explaining Contextualization in Language Models using Visual Analytics

Sevastjanova¹,

Kalouli²,

Beck³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Despite the success of contextualized language models on various NLP tasks, it is still unclear what these models really learn. In this paper, we contribute to the current efforts of explaining such models by exploring the continuum between function and content words with respect to contextualization in BERT, based on linguistically-informed insights. In particular, we utilize scoring and visual analytics techniques: we use an existing similarity-based score to measure contextualization and integrate it into a novel visual analytics technique, presenting the model's layers simultaneously and highlighting intra-layer properties and inter-layer differences. We show that contextualization is neither driven by polysemy nor by pure context variation. We also provide insights on why BERT fails to model words in the middle of the functionality continuum. * Contribution to the visualization part. † Equal contribution to the computational linguistics part.

show abstract

“…For example, Lin et al [79] and Han et al [114] explored and visualized attentions by annotating on the sentences. Park et al [98] presented SANVis, a DVA system, to understand the attention mechanism of transformer in NLP scenarios.…”

Section: Natural Language Interfacesmentioning

confidence: 99%

Deep Visual Analytics (DVA): Applications, Challenges and Future Directions

Islam¹,

Akter²,

Ratan³

et al. 2021

HCIS

View full text Add to dashboard Cite

Visual interactive system (VIS) has been received significant attention for solving various complex problems. However, designing and implementing a novel VIS with the large scale of data is a challenging task. While existing studies have applied various visual analytics (VA) to analyze and visualize insightful information, deep visual analytics (DVA) have considered as a promising technique to provide input evidences and explain system results. In this study, we present several deep learning (DL) techniques for analyzing data with visualization, which summarizes the state-of-the-art review on (i) big data analysis, (ii) cognitive and perception science, (iii) customer behavior analysis, (iv) natural language processing, (v) recommended system, (vi) healthcare analysis, (vii) fintech ecosystem, and (viii) tourism management. We present open research challenges for emerging DVA in the visualization community. We also highlight some key themes from the existing literature that may help to explore for future study. Thus, our goal is to help readers and researchers in DL and VA to understand key aspects in designing VIS for analysing data.

show abstract

SANVis: Visual Analytics for Understanding Self-Attention Networks

Cited by 22 publications

References 24 publications

SparseBERT: Rethinking the Importance Analysis in Self-attention

SparseBERT: Rethinking the Importance Analysis in Self-attention

Explaining Contextualization in Language Models using Visual Analytics

Deep Visual Analytics (DVA): Applications, Challenges and Future Directions

Contact Info

Product

Resources

About