“…The vulnerability of modern neural networks towards human imperceptible input variations has been studied for a while since (Szegedy et al, 2013), primarily in the computer vision community (e.g., (Goodfellow et al, 2015)), later extended to the NLP community (e.g., (Ebrahimi et al, 2017;Liang et al, 2017;Yin et al, 2020;Jones et al, 2020;Jia et al, 2019;Liu et al, 2019;Pruthi et al, 2019)). Recent studies suggest that the fragility of neural networks roots in that the data has multiple signals that can reduce the empirical risk, and when a model is forced to reduce the training error, it picks up whatever information that diminish the empirical loss, ignoring whether the learnt knowledge aligns with human perception or not (Wang et al, 2019b), connecting the adversarial robustness problems and the bias in data problems that has been studied for a while (e.g., (Wang et al, 2016;Goyal et al, 2017;Kaushik and Lipton, 2018;Wang et al, 2019a)).…”