“…Some raters viewed a binary scale as easy to judge and practical (Jeong, 2019;Khamboonruang, 2020;Park & Yan, 2019) and some perceived that a binary scale was cognitive-loaded and difficult to judge (Kim, 2010;Park & Yan, 2019). In terms of rater behaviours, previous research has discovered that raters' rating variability was influenced by training (Yan & Chuang, 2022), time of rating (Lamprianou et al, 2021), writing genres (Jeong, 2017;Jiuliang, 2014), and rater characteristics, including but not limited to rater experience (Barkaoui, 2010(Barkaoui, , 2011Şahan & Razı, 2020), rater fatigue (Mahshanian et al, 2017), rater personality (Zhu et al, 2021), rater age (Isbell, 2017), raters perceptions of criterion importance (Eckes, 2012), and rater styles, strategies and preferences (Han, 2017). Prior studies also found that raters were more consistent in rating higher-quality essays than poorer-quality essays (Han, 2017;Khamboonruang, 2020;Şahan & Razı, 2020), and still significantly differed in their levels of severity even though well-trained and experienced (e.g., Khamboonruang, 2020;Li, 2022;Mendoza & Knoch, 2018;Yan & Chuang, 2022).…”