health sciences, history THE CANADIAN HISTORICAL REVIEW utpjournals.press/chr Offering a comprehensive analysis on the events that have shaped Canada, CHR publishes articles that examine Canadian history from both a multicultural and multidisciplinary perspective.
The wide variability shown by different teachers in grading the same examinations of the essay type is well known. The classic experiments of Starch and Elliott on reliability of grading history, English, and geometry papers are well known and have been repeated in various forms. 1 But there seems to have been little investigation of the reliability of regrading of the same material by the same teachers. Starch reports a brief investigation in which seven college instructors were asked to regrade a set of ten of their own papers after intervals of two weeks to four years. In this case each question was regraded only once. 2 The object of this paper is to report the results of an experiment in regrading the same set of material after an interval of eleven weeks by sixty-one different teachers, members of the author's summer quarter class in tests and measurements. Practically all were experienced teachers. During the first week of the course they were given mimeographed sheets containing answers from grammer school geography and history papers, with instructions for grading them. The graded papers were handed in the next day. The geography questions were taken from Ruch's experiment; 3 those in history from an experiment reported by Paulu. 4 The questions used and instructions^given were as follows: GEOGRAPHY Name and locate five of the largest cities of the United Stales and name their leading industries, exports, and imports.
ANSWER ONEFive of the largest cities in the United States ia Detroit*. An export is Cars. And industry is Manufactoring. Chicago is an important city and an export is SUMMARY Repeated grading of the same essay type of material by the same teachers (sixty-one) after an interval of eleven weeks is very unreliable. Reliability coefficients vary from 0.25 to 0.51. Variability of human judgment in the same individual is about the same as variability between different individuals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.