Neural Machine Translation with Supervised Attention

Liu, Lemao; Utiyama, Masao; Finch, Andrew; Sumita, Eiichiro

doi:10.48550/arxiv.1609.04186

Cited by 6 publications

(7 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lastly, the deliberate training of attention weights has been studied in several papers in which the goal is not to study the explanatory power of attention weights but rather to achieve better predictive performance by introducing an additional source of supervision. In some of these papers, attention weights are guided by known word alignments in machine translation (Liu et al, 2016;Chen et al, 2016), or aligning human eyegaze with model's attention for sequence classification (Barrett et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

Learning to Deceive with Attention-Based Explanations

Pruthi¹,

Gupta²,

Dhingra³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

124

100

View full text Add to dashboard Cite

Attention mechanisms are ubiquitous components in neural architectures applied to natural language processing. In addition to yielding gains in predictive accuracy, attention weights are often claimed to confer interpretability, purportedly useful both for providing insights to practitioners and for explaining why a model makes its decisions to stakeholders. We call the latter use of attention mechanisms into question by demonstrating a simple method for training models to produce deceptive attention masks. Our method diminishes the total weight assigned to designated impermissible tokens, even when the models can be shown to nevertheless rely on these features to drive predictions. Across multiple models and tasks, our approach manipulates attention weights while paying surprisingly little cost in accuracy. Through a human study, we show that our manipulated attention-based explanations deceive people into thinking that predictions from a model biased against gender minorities do not rely on the gender. Consequently, our results cast doubt on attention's reliability as a tool for auditing algorithms in the context of fairness and accountability. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Learning to Deceive with Attention-Based Explanations

Pruthi¹,

Gupta²,

Dhingra³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

124

100

View full text Add to dashboard Cite

show abstract

“…This research has largely been conducted by visualizing or analyzing the learned attention weights of a whole attention module on only NLP tasks [17,40,18,25]. Many works [17,40,18] suggest that attention weight assignment in encoder-decoder attention plays a role similar to word alignment in traditional approaches [1,8,30,6]. The implicit underlying assumption in these works is that the input elements accorded high attention weights are responsible for the model outputs.…”

Section: Analysis Of Spatial Attention Mechanismsmentioning

confidence: 99%

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Zhu

Cheng

Zhang³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

377

145

View full text Add to dashboard Cite

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the comparison of query and key content in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. On the other hand, a proper combination of deformable convolution with key content saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms.

show abstract

“…Nevertheless, further refining attention by extra supervision has been shown to be beneficial. Examples include using word alignments to learn attention in neural machine translation (Liu et al, 2016), employing argument words to supervise attention in event detection (Liu et al, 2017), utilizing linguisticallymotivated annotations to guide attention in constituency parsing (Kamigaito et al, 2017). These supervision mechanisms are tailored to specific applications.…”

Section: Related Workmentioning

confidence: 99%

Deriving Machine Attention from Human Rationales

Bao

Chang

et al. 2018

Preprint

View full text Add to dashboard Cite

Attention-based models are successful when trained on large amounts of data. In this paper, we demonstrate that even in the low-resource scenario, attention can be learned effectively. To this end, we start with discrete humanannotated rationales and map them into continuous attention. Our central hypothesis is that this mapping is general across domains, and thus can be transferred from resource-rich domains to low-resource ones. Our model jointly learns a domain-invariant representation and induces the desired mapping between rationales and attention. Our empirical results validate this hypothesis and show that our approach delivers significant gains over state-ofthe-art baselines, yielding over 15% average error reduction on benchmark datasets. 1

show abstract

Neural Machine Translation with Supervised Attention

Cited by 6 publications

References 10 publications

Learning to Deceive with Attention-Based Explanations

Learning to Deceive with Attention-Based Explanations

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Deriving Machine Attention from Human Rationales

Contact Info

Product

Resources

About