Several parts of the STEP Writing Test, Level 1, were administered to 14 different groups of from 19 to 52 high school students. In the testing situations, scores were computed using the following scoring functions: (a) probability assigned to the correct answer, (b) the logarithmic function, (c) the spherical function, (d) the Euclidean function, and (e) inferred choice. Reliabilities of the scores obtained by means of each scoring function were computed. Comparisons between the reliabilities showed that the simplest and most intuitive function, the probability assigned to the correct answer, produced the highest reliability in comparison with any of the other functions. The data suggest that in the absence of information about the scoring system, subjects assign their confidence in multiple-choice responses on the basis of the intuitively simplest payoff model, and that reliability decreases as scoring functions generate item scores which are progressively discrepant from scores generated by the simplest model.
165
A large number of schools administer the STEP Writing Test (1957). Although the publisher of STEP has clearly stated the objectives which STEP was designed to measure, the question frequently arises from classroom teachers, just what can you tell about a student's actual writing behavior from the results of a multiple choice test.The manual for interpreting scores claims that STEP measures ability to think crtically in writing, to organize materials, to write material appropriate for a given purpose, to write effectively, and to observe conventional usage in punctuation and grammar (p. 7). The manual further states that, "The STEP Writing tests seek to measure comprehensively the full range of skills involved in the process of good writing."Items on the STEP are classified according to five categories: 1. organization, 2. conventions, 3. critical thinking, 4. effectiveness, 5. appropriateness.Black (Buros, 1959, p. 593-4) asserts that STEP fails in all but the second category and is only partially successful in measuring conventions. He arrived at his conclusions from his analysis of item content. Perhaps revealing his own biases concerning multiple choice writing tests he concludes "any educator who wishes to measure the full range of skills involved in the process of good writing will resort to writing itself."Hieronymous (Buros, 1959, p. 595) concludes, again from his analysis of item content, that STEP measures "very effectively higher-order writing skills, particularly those of effectiveness and appropriateness.
A system for responding to and scoring multiple-choice tests is proposed. This system asks students to express their distribution of preference for options as well as their certainty in that distribution. Such a system of scoring allows the use of types of test items which have previously been ignored.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.