“…His conceptualization of measurement-driven rating scale construction has highlighted the shortcomings of certain types of level descriptors, has unearthed mismatches between rating criteria and real-world TLU characteristics, and has stressed the need for empirically founded criteria (but see also Alderson, 2007;Jacoby & McNamara, 1999). However, various publications in the field of language testing have shown that a dichotomous rating scale typology (i.e., measurement-driven vs. performancedriven) may not correspond to actual practice, as many rating scales emerge from a variety of sources, including expert input, empirical performance data, and existing language proficiency frameworks such as the CEFR (Deygers & Van Gorp, 2015;Galaczi et al, 2011;Harsch & Martin, 2012;Knoch, 2009). Even the CEFR (Council of Europe, 2001) too, criticized by Fulcher (2004Fulcher ( , 2012 as an example of statistically driven, intuition-based design, describes rating scale development as the process of combining intuitive, qualitative, and quantitative methods.…”