“…As shown in Figures 3-6 and Appendices 7 and 8, valid comparison to the mixed-effects logistic regression's inference is each anesthesiologist's mean score equally weighting each rater -not the mean pooled score, as used in the scenario. 1,6 We previously showed that the two are different because of the inequality of the variabilities of scores among raters (P \ 0.001). 6 As shown in Figures 3 and 4 and Appendix 5, the mean scores cannot reliably be compared among anesthesiologists, once the anesthesiologists have received feedback and learned how to provide better supervision, unless adjustment is made for the leniency of raters.…”