Importance of Self-Attention for Sentiment Analysis

Letarte, Gaël; Paradis, F.; Giguère, Philippe; Laviolette, François

doi:10.18653/v1/w18-5429

Cited by 52 publications

(19 citation statements)

References 18 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An empirical evaluation is thus beyond the scope of this article. There are, however, a number of experimental studies focused on particular NLP tasks, including machine translation [37], [42], [48], [132], argumentation mining [125], text summarization [58], and sentiment analysis [7]. It is worthwhile remarking that, on several occasions, attention-based approaches enabled a dramatic development of entire research lines.…”

Section: Introductionmentioning

confidence: 99%

Attention in Natural Language Processing

Galassi

Lippi

Torroni

2021

IEEE Trans. Neural Netw. Learning Syst.

350

130

View full text Add to dashboard Cite

Attention is an increasingly popular mechanism used in a wide range of neural architectures. The mechanism itself has been realized in a variety of formats. However, because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures in natural language processing, with a focus on those designed to work with vector representations of the textual data. We propose a taxonomy of attention models according to four dimensions: the representation of the input, the compatibility function, the distribution function, and the multiplicity of the input and/or output. We present the examples of how prior information can be exploited in attention models and discuss ongoing research efforts and open challenges in the area, providing the first extensive categorization of the vast body of literature in this exciting domain.

show abstract

Section: Introductionmentioning

confidence: 99%

Attention in Natural Language Processing

Galassi

Lippi

Torroni

2021

IEEE Trans. Neural Netw. Learning Syst.

350

130

View full text Add to dashboard Cite

show abstract

“…Talking about this paradigm, various works focus on the weights of the attention layer in transformers [56] or other kinds of networks, such as the recurrent or the convolutional ones, to highlight the words or n-grams in the text that are the most relevant for the decision. Regarding the sentiment analysis task, authors in [57] observed a strong interaction between neighboring words visualizing the attention matrix of a transformerlike network. Furthermore, in [58], the authors of the work discussed the use of attention scores from an attention layer as a good and less computationally burdensome alternative to external explainer models like LIME [59,60] and integrated gradients [61] methods.…”

Section: Attention As Explanationmentioning

confidence: 99%

Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach

et al. 2021

View full text Add to dashboard Cite

In recent years, the explainable artificial intelligence (XAI) paradigm is gaining wide research interest. The natural language processing (NLP) community is also approaching the shift of paradigm: building a suite of models that provide an explanation of the decision on some main task, without affecting the performances. It is not an easy job for sure, especially when very poorly interpretable models are involved, like the almost ubiquitous (at least in the NLP literature of the last years) transformers. Here, we propose two different transformer-based methodologies exploiting the inner hierarchy of the documents to perform a sentiment analysis task while extracting the most important (with regards to the model decision) sentences to build a summary as the explanation of the output. For the first architecture, we placed two transformers in cascade and leveraged the attention weights of the second one to build the summary. For the other architecture, we employed a single transformer to classify the single sentences in the document and then combine the probability scores of each to perform the classification and then build the summary. We compared the two methodologies by using the IMDB dataset, both in terms of classification and explainability performances. To assess the explainability part, we propose two kinds of metrics, based on benchmarking the models’ summaries with human annotations. We recruited four independent operators to annotate few documents retrieved from the original dataset. Furthermore, we conducted an ablation study to highlight how implementing some strategies leads to important improvements on the explainability performance of the cascade transformers model.

show abstract

“…For text classification which only has single input sequence, attention based models mainly focus on applying attention mechanism on top of CNN or RNN for selecting the more important information (Yang et al, 2016;Er et al, 2016). Letarte et al (2018) and Shen et al (2018) also explore self-attention networks which is CNN/RNN free.…”

Section: Modesmentioning

confidence: 99%

“…For compared previous models, first block lists n-grams based models including bigram-FastText (Joulin et al, 2016) and region embedding (Qiao et al, 2018). Self-attention Networks SANet (Letarte et al, 2018) is reported in the second block. RNN based models LSTM (Zhang et al, 2015), D-LSTM (Yogatama et al, 2017) and CNN based models char-CNN (Zhang et al, 2015) and VDCNN (Conneau et al, 2016) are listed in third and forth block respectively.…”

Section: Experiments Settingsmentioning

confidence: 99%

Enhancing Local Feature Extraction with Global Representation for Neural Text Classification

Niu¹,

Xu²,

He³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

For text classification, traditional local feature driven models learn long dependency by deeply stacking or hybrid modeling. This paper proposes a novel Encoder1-Encoder2 architecture, where global information is incorporated into the procedure of local feature extraction from scratch. In particular, En-coder1 serves as a global information provider, while Encoder2 performs as a local feature extractor and is directly fed into the classifier. Meanwhile, two modes are also designed for their interactions. Thanks to the awareness of global information, our method is able to learn better instance specific local features and thus avoids complicated upper operations. Experiments conducted on eight benchmark datasets demonstrate that our proposed architecture promotes local feature driven models by a substantial margin and outperforms the previous best models in the fully-supervised setting.

show abstract

Importance of Self-Attention for Sentiment Analysis

Cited by 52 publications

References 18 publications

Attention in Natural Language Processing

Attention in Natural Language Processing

Explainable Sentiment Analysis: A Hierarchical Transformer-Based Extractive Summarization Approach

Enhancing Local Feature Extraction with Global Representation for Neural Text Classification

Contact Info

Product

Resources

About