“…Other domain tasks are transferable to NLI. Our work can be expanded to test LLMs on other NLP applications (Plank, 2022) such as Question Answering (De Marneffe et al, 2019), Fact Verification (Thorne et al, 2018), and Toxic Language Detection (Schmidt and Wiegand, 2017;Sandri et al, 2023). Further, our method can be applied for tasks that contain disagreements since they are easily transferable to NLI tasks (Dagan et al, 2006) like the QNLI dataset from Table 2, for example, instead of directly asking controversial questions (e.g., abortion) to the model (Santurkar et al, 2023), the question format can be modified into a declarative statement in the premise and place a possible answer in the hypothesis with a binary True/False label (Dagan et al, 2006).…”