“…There have been some attempts at comparing different similarity models, both in cognitive science (Bullinaria & Levy, 2007Burgess & Lund, 2000;Landauer & Dumais, 1997;Rohde, Gonnerman, & Plaut, 2006, Mandera et al, 2017Pereira, Gershman, Ritter, & Botvinick, 2016) and computational linguistics (e.g., Gerz, Vulić, Hill, Reichart, & Korhonen, 2016;Hill, Cho, Jean, Devin, & Bengio, 2014Ponti, Vulić, Glavaš, Mrkšić, & Korhonen, 2020;Wieting, Bansal, Gimpel, & Livescu, 2015). However, most of this work has used benchmark datasets (e.g., the TOEFL dataset and SimLex-999; Hill, Reichart, & Korhonen, 2015;Landauer & Dumais, 1997) that sample pairs of words from the entire lexicon and include a large number of rather unrelated pairs of words (for a similar point, see De Deyne, Navarro, Collell, & Perfors, 2021). For example, in SimLex-999, pairs include "wife-straw" and "ankle-window."…”