“…Cache/Memory-based approaches (Tu et al, 2018;Kuang et al, 2018;Maruf and Haffari, 2018;Wang et al, 2017) store word/sentence translation in previous sentences for future sentence translation. Various approaches with an extra context encoders are proposed to model either local context, e.g., previous sentences Wang et al, 2017;Bawden et al, 2018;Voita et al, 2018Voita et al, , 2019bYang et al, 2019;Huo et al, 2020), or entire document (Maruf and Haffari, 2018;Mace and Servan, 2019;Maruf et al, 2019;Tan et al, 2019;Zheng et al, 2020;Kang et al, 2020).…”