“…In a broader sense, we believe that our study necessitates a field-wide discussion on what exactly makes an effect a "qualitative benchmark", and what standards should a pattern of data have to meet to become a benchmark that we base theories and models on. For instance, the Stroop effect (see Stroop, 1935;Tillman, Eidels, & Finkbeiner, 2016;Tillman, Howard, Garret, & Eidels, 2017) has been replicated at the group level for the last 80 years (MacLeod, 1991) and is also consistent across individuals (Haaf & Rouder, 2017, suggesting that it is a strong qualitative benchmark for developing theories. In the speeded decision making literature, a reliable effect is the positive skew of RT distributions, which is generally considered to be an ubiquitous trend (Evans, Hawkins, Boehm, Wagenmakers, & Brown, 2017).…”