“…To validate the generalization ability of the proposed method, we conduct the experiments on a recent published Spot-the-Diff dataset, where the image pairs are mostly well aligned and their is no viewpoint change. We compare with eight SOTA methods and most of them cannot consider handling viewpoint changes: DDLA (Jhamtani and Berg-Kirkpatrick, 2018), DDUA (Park et al, 2019), SDCM (Oluwasanmi et al, 2019a), FCC (Oluwasanmi et al, 2019b), static rel-att / dyanmic rel-att (Tan et al, 2019), and M-VAM / M-VAM+RAF (Shi et al, 2020).…”