Rater bias patterns in an EFL writing assessment

Schaefer, Edward J.

doi:10.1177/0265532208094273

Cited by 105 publications

(105 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This indicates that the raters demonstrate more severity in rating when rating highly competent test takers; however, they were fairly lenient in their ratings toward extremely weak test takers. This finding is parallel, albeit in a writing assessment test, to one found by Schaefer (2008), who in an analysis of ratings by 40 native English speakers of 40 essays by Japanese students found some raters scored higher ability test takers more severely and lower ability ones more leniently than expected. The reason of this interaction tendency is not quite clear; however, it might be due to the fact that raters' expectations of test takers rise as test takers' abilities increase, thus making their judgments severer.…”

Section: Resultssupporting

confidence: 80%

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Bijani¹,

Khabiri²

2017

The Journal of AsiaTEFL

View full text Add to dashboard Cite

Typically, variability among raters in scoring and their bias is mediated through rater training. However, questions still remain about whether training can affect raters' severity or leniency. Furthermore, few studies have looked at the differences between trained and untrained raters in oral assessment. Oral test scores of 200 test takers rated by 20 raters and were analyzed before and after a training program using the multifaceted Rasch measurement (MFRM). The results demonstrated the constructive impact of training programs in reducing raters' biases and increasing their consistency measures. This study indicated that inexperienced raters benefited more from a training program than experienced raters and thus achieved higher measures of consistency afterward. It also demonstrated a higher biased interaction for test takers on the extreme ends of the oral ability continuum. The findings demonstrated that it is almost impossible to completely eradicate rater variability even through rater training. Therefore, rater training should be viewed as a procedure to establish within-rater consistency rather than between-rater consistency. Since this study showed that inexperienced raters can rate even more reliably than experienced ones after training, there is no evidence whereby decision makers can exclude inexperienced raters solely because of their lack of adequate experience. Consequently, decision makers need to use their budgets for establishing rater training programs for inexperienced raters instead.

show abstract

Section: Resultssupporting

confidence: 80%

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Bijani¹,

Khabiri²

2017

The Journal of AsiaTEFL

View full text Add to dashboard Cite

show abstract

“…Assessing writing ability and the reliability of ratings have been a challenging concern for decades and there is always variation in the elements of writing preferred by raters and there are extraneous factors causing variation (Blok, 1985;Chase, 1968;Chase, 1983;Darus, 2006;East, 2009;Engelhard, 1994;Gyagenda & Engelhard, 1998a;Gyagenda & Engelhard, 1998b;Hughes, Keeling & Tuck, 1980;Hughes, Keeling & Tuck, 1983;Hughes & Keeling, 1984;Kan, 2005;Klein & Hart, 1968;Klein & Taub, 2005;Marshall & Powers, 1969;Murphy & Balzer, 1989;Schaefer, 2008;Slomp, 2012;Sulsky & Balzer, 1988;Wexley & Youtz, 1985;Woehr & Huffcutt, 1994). Fisher, Brooks, and Lewis (2002) state fitness for purpose requirement is the core of all testing work, and direct writing assessments are subjective and thereby more prone to reliability issues.…”

Section: Conclusion and Recommendationsmentioning

confidence: 99%

Measuring Essay Assessment: Intra-rater and Inter-rater Reliability

Kayapinar¹

2014

EJER

View full text Add to dashboard Cite

Problem Statement: There have been many attempts to research the effective assessment of writing ability, and many proposals for how this might be done. In this sense, rater reliability plays a crucial role for making vital decisions about testees in different turning points of both educational and professional life. Intra-rater and inter-rater reliability of essay assessments made by using different assessing tools should also be discussed with the assessment processes. Purpose of Study:The purpose of the study is to reveal possible variation or consistency in grading essay writing ability of EFL writers by the same/different raters using general impression marking (GIM), essay criteria checklist (ECC), and essay assessment scale (ESAS), and discuss rater reliability.Methods: Quantitative and qualitative data were used to present the discussion and implications for the reliability of ratings and the consistency of the measurement results. The assessing tools were applied to 44 EFL university students and 10 graders assessed the essay writing ability of the students by using GIM, ECC, and ESAS in different occasions. Findings and Results:The findings and results of the analyses indicated that using general impression marking is evidently not reliable for assessing essays. The coefficients obtained from checklist and scale assessments, considering the correlation coefficients, estimated variance components, and generalizability coefficients present valuable information, clearly show that there is always variation among the results.

show abstract

“…In the rating process, various factors come into play: rater characteristics towards severity or leniency (Schaefer, 2008;Shi 2001), rater training experience (Huot, 1990;Weigle, 1998Weigle, , 2002), rater's language background (Kondo-Brown, 2002;Lumley & McNamara, 1995), and task variability (O'Loughlin and Wigglesworth 2007) are factors that have been researched over the years in performance assessment. Past research on rubric studies has focused on investigating changes in rater reliability (Lumley & McNamara 1995;McNamara, 1996;Weigle, 1998).…”

mentioning

confidence: 99%

Rubrics in the classroom: do teachers really follow them?

Jeong

2015

Language Testing in Asia

View full text Add to dashboard Cite

Background: For language teachers, using rubrics has become the norm in assessing performance-based work. When using rubrics, one question stakeholders have is to what extent teachers are true to the rubrics. For classroom teachers, the correct use of rubrics is crucial. Rater training and rater calibration are not commonly offered to teacher-raters; therefore, the accurate use of rubrics is required in assessing student performance.

show abstract

Rater bias patterns in an EFL writing assessment

Cited by 105 publications

References 13 publications

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Investigating the Effect of Training on Raters’ Bias toward Test Takers in Oral Proficiency Assessment : A FACETS Analysis

Measuring Essay Assessment: Intra-rater and Inter-rater Reliability

Rubrics in the classroom: do teachers really follow them?

Contact Info

Product

Resources

About