2008
DOI: 10.1177/0265532208094273
|View full text |Cite
|
Sign up to set email alerts
|

Rater bias patterns in an EFL writing assessment

Abstract: The present study employed multi-faceted Rasch measurement (MFRM) to explore the rater bias patterns of native English-speaker (NES) raters when they rate EFL essays. Forty NES raters rated 40 essays written by female Japanese university students on a single topic adapted from the TOEFL Test of Written English (TWE). The essays were assessed using a six-category rating scale (Content, Organization, Style and Quality of Expression, Language Use, Mechanics, and Fluency). MFRM revealed several recurring bias patt… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

4
83
3
5

Year Published

2011
2011
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 105 publications
(105 citation statements)
references
References 13 publications
4
83
3
5
Order By: Relevance
“…This indicates that the raters demonstrate more severity in rating when rating highly competent test takers; however, they were fairly lenient in their ratings toward extremely weak test takers. This finding is parallel, albeit in a writing assessment test, to one found by Schaefer (2008), who in an analysis of ratings by 40 native English speakers of 40 essays by Japanese students found some raters scored higher ability test takers more severely and lower ability ones more leniently than expected. The reason of this interaction tendency is not quite clear; however, it might be due to the fact that raters' expectations of test takers rise as test takers' abilities increase, thus making their judgments severer.…”
Section: Resultssupporting
confidence: 80%
“…This indicates that the raters demonstrate more severity in rating when rating highly competent test takers; however, they were fairly lenient in their ratings toward extremely weak test takers. This finding is parallel, albeit in a writing assessment test, to one found by Schaefer (2008), who in an analysis of ratings by 40 native English speakers of 40 essays by Japanese students found some raters scored higher ability test takers more severely and lower ability ones more leniently than expected. The reason of this interaction tendency is not quite clear; however, it might be due to the fact that raters' expectations of test takers rise as test takers' abilities increase, thus making their judgments severer.…”
Section: Resultssupporting
confidence: 80%
“…Assessing writing ability and the reliability of ratings have been a challenging concern for decades and there is always variation in the elements of writing preferred by raters and there are extraneous factors causing variation (Blok, 1985;Chase, 1968;Chase, 1983;Darus, 2006;East, 2009;Engelhard, 1994;Gyagenda & Engelhard, 1998a;Gyagenda & Engelhard, 1998b;Hughes, Keeling & Tuck, 1980;Hughes, Keeling & Tuck, 1983;Hughes & Keeling, 1984;Kan, 2005;Klein & Hart, 1968;Klein & Taub, 2005;Marshall & Powers, 1969;Murphy & Balzer, 1989;Schaefer, 2008;Slomp, 2012;Sulsky & Balzer, 1988;Wexley & Youtz, 1985;Woehr & Huffcutt, 1994). Fisher, Brooks, and Lewis (2002) state fitness for purpose requirement is the core of all testing work, and direct writing assessments are subjective and thereby more prone to reliability issues.…”
Section: Conclusion and Recommendationsmentioning
confidence: 99%
“…In the rating process, various factors come into play: rater characteristics towards severity or leniency (Schaefer, 2008;Shi 2001), rater training experience (Huot, 1990;Weigle, 1998Weigle, , 2002), rater's language background (Kondo-Brown, 2002;Lumley & McNamara, 1995), and task variability (O'Loughlin and Wigglesworth 2007) are factors that have been researched over the years in performance assessment. Past research on rubric studies has focused on investigating changes in rater reliability (Lumley & McNamara 1995;McNamara, 1996;Weigle, 1998).…”
mentioning
confidence: 99%