“…In the past few years, there have been several efforts in taking advantage of images to discover and enhance connections across different languages ( Gella et al, 2017 ; Nakayama and Nishida, 2017 ; Elliott and Kádár, 2017 ). While some works have exploited alignments at the word-level ( Bergsma and Van Durme, 2011 ; Hewitt et al, 2018 ), recent work has moved forward to finding alignments between complex sentences ( Barrault et al, 2018 ; Surís et al, 2020 ; Sigurdsson et al, 2020 ; Yang et al, 2020 ).…”