Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1453
|View full text |Cite
|
Sign up to set email alerts
|

Jointly Learning to Align and Translate with Transformer Models

Abstract: The state of the art in machine translation (MT) is governed by neural approaches, which typically provide superior translation accuracy over statistical approaches. However, on the closely related task of word alignment, traditional statistical word alignment models often remain the go-to solution. In this paper, we present an approach to train a Transformer model to produce both accurate translations and alignments. We extract discrete alignments from the attention probabilities learnt during regular neural … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
140
1
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 109 publications
(147 citation statements)
references
References 24 publications
(31 reference statements)
5
140
1
1
Order By: Relevance
“…We explore this hypothesis on the widely used Gold Alignment dataset 3 and follow Tang et al (2019) to perform the alignment. The only difference being that we average the attention matrices across all heads from the penultimate layer (Garg et al, 2019). The alignment error rate (AER, Och and Ney 2003), precision (P) and recall (R) are reported as the evaluation metrics.…”
Section: Alignment Qualitymentioning
confidence: 99%
“…We explore this hypothesis on the widely used Gold Alignment dataset 3 and follow Tang et al (2019) to perform the alignment. The only difference being that we average the attention matrices across all heads from the penultimate layer (Garg et al, 2019). The alignment error rate (AER, Och and Ney 2003), precision (P) and recall (R) are reported as the evaluation metrics.…”
Section: Alignment Qualitymentioning
confidence: 99%
“…A closely related research area attempts to guide the attention mechanism, e.g. by incorporating alignment objectives (Garg et al, 2019), or improving the representation through external information such as syntactic supervision (Pham et al, 2019;Currey and Heafield, 2019;Deguchi et al, 2019). The third line of research argues that Transformer networks are over-parametrized and learn redundant information that can be pruned in various ways (Sanh et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…The attention mechanism in NMT does not functionally play the role of word alignments between the source and the target, at least not in the same way as its analog in SMT. It is hard to interpret the attention activations and extract meaningful word alignments especially from Transformer (Garg et al, 2019). As a result, the most widely used word alignment tools are still external statistical models such as FAST-ALIGN (Dyer et al, 2013) and GIZA++ (Brown et al, 1993;Och and Ney, 2003).…”
Section: Introductionmentioning
confidence: 99%
“…1. However, such schedule only captures noisy word alignments (Ding et al, 2019;Garg et al, 2019). One of the major problems is that it induces alignment before observing the to-be-aligned target token (Peter et al, 2017;Ding et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation