“…Recent advancements (Zhang et al, 2017;Artetxe et al, 2018;Lample et al, 2018a;Conneau and Lample, 2019;Song et al, 2019;Han et al, 2021) in UNMT have achieved remarkable milestones. However, there are still certain issues associated with solely utilizing monolingual data for translation, such as domain mismatch problem (Marchisio et al, 2020;Kim et al, 2020), poor performance on distance language pairs (Kim et al, 2020;Chronopoulou et al, 2020;Sun et al, 2021), translationese (He et al, 2022), and lexical confusion (Bapna et al, 2022;Jones et al, 2023). Our work aims to address the issue of lexical confusion by utilizing wordlevel image data, thereby eliminating the need for costly bilingual dictionary annotations.…”