“…Englehart (1965) reported correlations between D and r pb of .92 and .95 on two forms of a high school 60‐item history exam. Oosterhof (1976) reported a correlation of .94 from a 50‐item verbal analogy test (Differential Aptitude Test) of 1,000 high school students. In a Monte Carlo study varying the sample size, number of factors in an instrument, and item difficulty, Beuchert and Mendoza (1979) found differences among 10 indices of item discrimination “to be extremely small or nonexistent in situations tending to accentuate those differences” (p. 116).…”
Multiple‐choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high‐quality items. More 3‐option items can be administered than 4‐ or 5‐option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3‐option items for over 80 years with empirical evidence—the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption.
“…Englehart (1965) reported correlations between D and r pb of .92 and .95 on two forms of a high school 60‐item history exam. Oosterhof (1976) reported a correlation of .94 from a 50‐item verbal analogy test (Differential Aptitude Test) of 1,000 high school students. In a Monte Carlo study varying the sample size, number of factors in an instrument, and item difficulty, Beuchert and Mendoza (1979) found differences among 10 indices of item discrimination “to be extremely small or nonexistent in situations tending to accentuate those differences” (p. 116).…”
Multiple‐choice items are a mainstay of achievement testing. The need to adequately cover the content domain to certify achievement proficiency by producing meaningful precise scores requires many high‐quality items. More 3‐option items can be administered than 4‐ or 5‐option items per testing time while improving content coverage, without detrimental effects on psychometric quality of test scores. Researchers have endorsed 3‐option items for over 80 years with empirical evidence—the results of which have been synthesized in an effort to unify this endorsement and encourage its adoption.
“…Since item responses are generally recorded as right or wrong, the measurement of item discrimination usually involves a dichotomous variable (performance on the item) and a continuous variable (performance on the criterion). Many different indexes of item discrimination have been developed and used, but, despite differences in procedures and assumptions, most of the indexes provide similar results (Oosterhof, 1976). In other words, although the numerical values of the indexes may differ, the items that are retained and those that are rejected on the basis of different discrimination indexes are largely the same.…”
We show that using the point‐biserial as a discrimination index for distractors by differentiating between examinees who chose the distractor and examinees who did not choose the distractor is theoretically wrong and may lead to an incorrect rejection of items. We propose an alternative usage and present empirical evidence for its suitability.
“…We have studied only the ø‐coefficient of discrimination in MCQ and TF examinations. It has been shown, however, that all the commonly used indices of discrimination are based on similar assumptions and yield comparable results when used to analyse the same data (Engelhardt 1965; Aleomoni & Spencer 1969; Hales 1972; Oosterhof 1976; Beuchert & Mendoza 1979). Furthermore, as it is our contention that the problems uncovered here are not caused by any fault of the ø‐test as such, but by the inappropriateness of the data for the test, it is reasonable to expect that all the indices of discrimination will show the same variability as the ø when used for item analysis in MCQ and TF examinations.…”
The phi-coefficient of an item in a multiple choice question (MCQ) examination is often used to determine whether that item is suitable for re-use in future examinations. In order to be of value in this regard, the coefficient must be shown to be an objective and consistent index of the discriminating power of an MCQ item. The behaviour of the phi-coefficient (phi) was investigated in two one-from-five MCQ and two true/false examinations. It is shown that the magnitude of the phi-coefficient for the items in any examination is a function not only of the discriminating power of the items, but also of the magnitude of the countermark for incorrect responses, of the proportion of 'don't know' responses in the examination, and of how the 'don't know' responses are handled in the analysis. It is further shown that the reproducibility of the phi, when calculated for any pair of randomly selected portions of a class of students, is very poor. We conclude that the error of estimation of the phi is of the same order of magnitude as the phi's normal operative range.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.