Rater effects in performance testing is an area in which much new research is needed (C. M. Myford, personal communication, 23 February, 2010). While previous studies of bias or interaction effect as a component of rater effect have employed experienced teachers as raters (e.g., Schaefer, 2008), the present study uses many-facet Rasch measurement (MFRM) to investigate differential rater effect or rater severity or leniency among three rater types: self-assessor, peer-assessor, and teacher assessor. Essays written in English by 188 Iranian English majors at two state-run universities in Iran were rated both by the students themselves as self-assessors and peer-assessors and by teachers, using a 6-point analytic rating scale. MFRM revealed differing patterns of severity and leniency among the three assessment types. For example, self-assessors and teacher assessors showed the opposite pattern of severity and leniency as compared with peer-assessors when assessing the highest and lowest ability students. This study has implications for the use of peer and self-rating in L2 writing assessment.
Professionalism requires that language teachers be assessment literate so as to assess students' performance more effectively. However, assessment literacy (AL) has remained a relatively unexplored area. Given the centrality of AL in educational settings, in the present study, we identified the factors constituting AL among university instructors and examined the ways English Language Instructors (ELIs) and Content Instructors (CIs) differed on AL. A researcher-made, 50-item questionnaire was constructed and administered to both groups: ELIs (N = 155) and CIs (N = 155). A follow-up interview was conducted to validate the findings. IBM SPSS (version 21) was used to analyse the data quantitatively. Results of exploratory factor analysis showed that AL included three factors: theoretical dimension of testing, test construction and analysis, and statistical knowledge. Further, results revealed statistically significant differences between ELIs and CIs in AL. Qualitative results showed that the differences were primarily related to the amount of training in assessment, methods of evaluation, purpose of assessment, and familiarity with psychometric properties of tests. Building on these findings, we discuss implications for teachers' professional development.
In this study, the researcher used the many-facet Rasch measurement model (MFRM) to detect two pervasive rater errors among peer-assessors rating EFL essays. The researcher also compared the ratings of peer-assessors to those of teacher assessors to gain a clearer understanding of the ratings of peer-assessors. To that end, the researcher used a fully crossed design in which all peer-assessors rated all the essays MA students enrolled in two Advanced Writing classes in two private universities in Iran wrote. The peer-assessors used a 6-point analytic rating scale to evaluate the essays on 15 assessment criteria. The results of Facets analyses showed that, as a group, peer-assessors did not show central tendency effect and halo effect; however, individual peer-assessors showed varying degrees of central tendency effect and halo effect. Further, the ratings of peer-assessors and those of teacher assessors were not statistically significantly different.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.