“…Examples include the many-faceted Rasch model (Linacre, 1989); FACETS model (Lunz, Wright , & Linacre, 1990); an IRT model for multiple raters (Verhelst & Verstralen, 2001); the rater bundle model ; the hierarchical rater model (Patz, Junker, Johnson, & Mariano, 2002) and its signal detection theory version (DeCarlo, 2010;DeCarlo, Kim, and Johnson, 2011); and Yao's rater model (Wang & Yao, 2013). These models are most useful when all the CR items of an assessment have been scored and merged with the multiple choice items (Sgammato & Donoghue, 2018). However in some testing programs, a scoring team consists of a group of 10 to 12 raters who are led by a supervisor and a trainer.…”