“…nine natural language understanding (NLU) tasks. As shown in Table 1, it includes question answering (Rajpurkar et al, 2016), linguistic acceptability (Warstadt et al, 2018), sentiment analysis (Socher et al, 2013), text similarity (Cer et al, 2017), paraphrase detection (Dolan and Brockett, 2005), and natural language inference (NLI) Bar-Haim et al, 2006;Giampiccolo et al, 2007;Bentivogli et al, 2009;Levesque et al, 2012;Williams et al, 2018). The diversity of the tasks makes GLUE very suitable for evaluating the generalization and robustness of NLU models.…”