“…Once again, natural language processing offers an excellent example: language models are generally trained on one or more general-purpose objectives (e.g. next-word prediction), and, after (often minimal) fine-tuning, they are evaluated against composite benchmarks (e.g., Sakaguchi, Le Bras, Bhagavatula, & Choi, 2019;Wang et al, 2019). In this regard, a particularly interesting example is that of GPT-3 (T. B.…”