1930
DOI: 10.1037/h0071103
|View full text |Cite
|
Sign up to set email alerts
|

Reliability of repeated grading of essay type examinations.

Abstract: The wide variability shown by different teachers in grading the same examinations of the essay type is well known. The classic experiments of Starch and Elliott on reliability of grading history, English, and geometry papers are well known and have been repeated in various forms. 1 But there seems to have been little investigation of the reliability of regrading of the same material by the same teachers. Starch reports a brief investigation in which seven college instructors were asked to regrade a set of ten… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
17
0

Year Published

1991
1991
2016
2016

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(18 citation statements)
references
References 0 publications
1
17
0
Order By: Relevance
“…In line with earlier research on the statewide tests administered by the CDE (Wilson & Case, 2000;Wilson & Wang, 1995) and research by others from as early as the 1930s (e.g., Ashbum, 1938;Eells, 1930) significant variability between raters was found in their essay scoring. Moreover, significant variation was also found within raters overqtime, confirming findings reported by Braun (1988).…”
Section: Discussionsupporting
confidence: 85%
“…In line with earlier research on the statewide tests administered by the CDE (Wilson & Case, 2000;Wilson & Wang, 1995) and research by others from as early as the 1930s (e.g., Ashbum, 1938;Eells, 1930) significant variability between raters was found in their essay scoring. Moreover, significant variation was also found within raters overqtime, confirming findings reported by Braun (1988).…”
Section: Discussionsupporting
confidence: 85%
“…Similarly, Jacoby (1910) interpreted his high agreement as a result of the high quality of the papers in his sample. Eells (1930), however, found greater grading consistency in the poorer papers. Lauterbach (1928) found more grading variability for typewritten compositions than for handwritten versions of the same work.…”
Section: Grading Reliabilitymentioning
confidence: 89%
“…Many of these early studies communicated a "what's wrong with teachers" undertone that today would likely be seen as researcher bias. Early researchers attributed sources of variation in teachers' grades to one or more of the following sources: criteria (Ashbaugh, 1924;Brimi, 2011;Healy, 1935;Silberstein, 1922;Sims, 1933, Starch, 1915Starch & Elliott, 1913a,b), students' work quality (Bolton, 1927;Healy, 1935;Jacoby, 1910;Lauterbach, 1928;Shriner, 1930;Sims, 1933), teacher severity/leniency (Shriner, 1930;Silberstein, 1922;Sims, 1933;Starch, 1915;Starch & Elliott, 1913b), task (Silberstein, 1922;Starch & Elliott, 1913a), scale (Ashbaugh, 1924;Sims, 1933;Starch 1913Starch , 1915, and teacher error (Brimi, 2011;Eells, 1930;Hulten, 1925;Lauterbach, 1928, Silberstein, 1922Starch & Elliott, 1912, 1913a. Starch (1913, Starch & Elliott 1913b found that teacher error and emphasizing different criteria were the two largest sources of variation.…”
Section: Discussion: What Do Grades Mean?mentioning
confidence: 99%
“…The difference vis-a-vis peer review is the use of a marking scheme, however imprecise, which is practised by the examiners. This difference is immediately apparent in comparison with university examinations (Byrne 1980;Cox 1967;Eells 1930;Hartog et al 1936;Laming 1990; again excluding mathematical and physical sciences), which usually have no such scheme. Take away any pretence at a marking scheme and the reliability of examination marks falls to near the levels reported for peer review.…”
Section: Decision Of Review Process Accepted Rejected Submissionsmentioning
confidence: 98%