“…However, most more recent work has focused on content similarity via bag-of-words or bag-of-ngrams, using bilingual lexicon (Ma and Liberman, 1999;Fung and Cheung, 2004;Ion et al, 2011;Esplà-Gomis et al, 2016;Azpeitia and Etchegoyhen, 2019), machine translation (Uszkoreit et al, 2010), or phrase tables (Gomes and Pereira Lopes, 2016). Some work has considered high-level order as a filtering step after using a unordered representation to generate candidates: Ma and Liberman (1999) and Le et al (2016) discard n-gram pairs outside a fixed window, while Uszkoreit et al (2010) filters out documents that have high edit distance between sequences of corresponding n-gram pairs. Utiyama and Isahara (2003) and Zhang et al (2006) use sentence similarity and/or number of aligned sentences after performing sentence alignment to score candidate documents.…”