“…Word alignments are essential for statistical machine translation and useful in NMT, e.g., for imposing priors on attention matrices (Liu et al, 2016;Chen et al, 2016;Alkhouli and Ney, 2017;Alkhouli et al, 2018) or for decoding (Alkhouli et al, 2016;Press and Smith, 2018). Further, word alignments have been successfully used in a range of tasks such as typological analysis (Lewis and Xia, 2008;Östling, 2015b), annotation projection (Yarowsky et al, 2001;Padó and Lapata, 2009;Asgari and Schütze, 2017;Huck et al, 2019) and creating multilingual embeddings (Guo et al, 2016;Ammar et al, 2016;Dufter et al, 2018 Statistical word aligners such as the IBM models (Brown et al, 1993) and their implementations Giza++ (Och and Ney, 2003), fast-align (Dyer et al, 2013), as well as newer models such as eflomal (Östling and Tiedemann, 2016) are widely used for alignment.…”