On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Alkhouli, Tamer; Bretschner, Gabriel; Ney, Hermann

doi:10.18653/v1/w18-6318

Cited by 46 publications

(53 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Word alignments are essential for statistical machine translation and useful in NMT, e.g., for imposing priors on attention matrices (Liu et al, 2016;Chen et al, 2016;Alkhouli and Ney, 2017;Alkhouli et al, 2018) or for decoding (Alkhouli et al, 2016;Press and Smith, 2018). Further, word alignments have been successfully used in a range of tasks such as typological analysis (Lewis and Xia, 2008;Östling, 2015b), annotation projection (Yarowsky et al, 2001;Padó and Lapata, 2009;Asgari and Schütze, 2017;Huck et al, 2019) and creating multilingual embeddings (Guo et al, 2016;Ammar et al, 2016;Dufter et al, 2018 Statistical word aligners such as the IBM models (Brown et al, 1993) and their implementations Giza++ (Och and Ney, 2003), fast-align (Dyer et al, 2013), as well as newer models such as eflomal (Östling and Tiedemann, 2016) are widely used for alignment.…”

Section: Introductionmentioning

confidence: 99%

SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings

Sabet¹,

Dufter²,

Yvon³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Word alignments are useful for tasks like statistical and neural machine translation (NMT) and cross-lingual annotation projection. Statistical word aligners perform well, as do methods that extract alignments jointly with translations in NMT. However, most approaches require parallel training data, and quality decreases as less training data is available. We propose word alignment methods that require no parallel data. The key idea is to leverage multilingual word embeddings -both static and contextualized -for word alignment. Our multilingual embeddings are created from monolingual data only without relying on any parallel data or dictionaries. We find that alignments created from embeddings are superior for four and comparable for two language pairs compared to those produced by traditional statistical aligners -even with abundant parallel data; e.g., contextualized embeddings achieve a word alignment F 1 for English-German that is 5 percentage points higher than eflomal, a high-quality statistical aligner, trained on 100k parallel sentences.

show abstract

Section: Introductionmentioning

confidence: 99%

SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings

Sabet¹,

Dufter²,

Yvon³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…Similar with MTL-FULLC (Garg et al, 2019), BAO-GUIDED adapts the alignment induction with the to-be-aligned target token by requiring full target sentence as the input. Therefore, BAO-GUIDED is not applicable in cases where alignments are incrementally computed during the decoding process, e.g., dictionary-guided decoding (Alkhouli et al, 2018). In contrast, SHIFT-AET performs quite well on such cases (Section 4.3).…”

Section: Alignment Resultsmentioning

confidence: 99%

“…In addition to AER, we compare the performance of NAIVE-ATT, SHIFT-ATT and SHIFT-AET on dictionary-guided machine translation (Song et al, 2020), which is an alignment-based downstream task. Given source and target constraint pairs from dictionary, the NMT model is encouraged to translate with provided constraints via word alignments (Alkhouli et al, 2018;Hasler et al, 2018;Hokamp and Liu, 2017;Song et al, 2020). More specifically, at each decoding step, the last token of the candidate translation will be revised with target constraint if it is aligned to the corresponding source constraint according to the alignment induction method.…”

Section: Downstream Task Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Accurate Word Alignment Induction from Neural Machine Translation

Chen¹,

Liu²,

Chen³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Despite its original goal to jointly learn to align and translate, prior researches suggest that Transformer captures poor word alignments through its attention mechanism. In this paper, we show that attention weights DO capture accurate word alignments and propose two novel word alignment induction methods SHIFT-ATT and SHIFT-AET. The main idea is to induce alignments at the step when the to-be-aligned target token is the decoder input rather than the decoder output as in previous work. SHIFT-ATT is an interpretation method that induces alignments from the attention weights of Transformer and does not require parameter update or architecture change. SHIFT-AET extracts alignments from an additional alignment module which is tightly integrated into Transformer and trained in isolation with supervision from symmetrized SHIFT-ATT alignments. Experiments on three publicly available datasets demonstrate that both methods perform better than their corresponding neural baselines and SHIFT-AET significantly outperforms GIZA++ by 1.4-4.8 AER points. 1 * Corresponding author. Part of the work was done when Yun was in Huawei Noah's Ark Lab.1 Code can be found at https://github.com/ sufe-nlp/transformer-alignment.Source: das weiß ich . Dec. input: i understand this . Dec. output: i understand this .

show abstract

“…Deriving alignments is known to more challenging for transformer networks with self-attention and multiple attention heads. There has been some recent work for alleviating this issue by explicitly adding an alignment head to the base architecture [15].…”

Section: Choice Of Nmt Architecturementioning

confidence: 99%

Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition

Punjabi¹,

Arsikere²,

Garimella³

2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

Building conversational speech recognition systems for new languages is constrained by the availability of utterances capturing user-device interactions. Data collection is expensive and limited by speed of manual transcription. In order to address this, we advocate the use of neural machine translation as a data augmentation technique for bootstrapping language models. Machine translation (MT) offers a systematic way of incorporating collections from mature, resource-rich conversational systems that may be available for a different language. However, ingesting raw translations from a general purpose MT system may not be effective owing to the presence of named entities, intra sentential code-switching and the domain mismatch between the conversational data being translated and the parallel text used for MT training. To circumvent this, we explore following domain adaptation techniques: (a) sentence embedding based data selection for MT training, (b) model finetuning, and (c) rescoring and filtering translated hypotheses. Using Hindi language as the experimental testbed, we supplement transcribed collections with translated US English utterances. We observe a relative word error rate reduction of 7.8-15.6%, depending on the bootstrapping phase. Fine grained analysis reveals that translation particularly aids the interaction scenarios underrepresented in the transcribed data.

show abstract

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Cited by 46 publications

References 22 publications

SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings

SimAlign: High Quality Word Alignments Without Parallel Training Data Using Static and Contextualized Embeddings

Accurate Word Alignment Induction from Neural Machine Translation

Language Model Bootstrapping Using Neural Machine Translation for Conversational Speech Recognition

Contact Info

Product

Resources

About