2016
DOI: 10.1111/jedm.12099
|View full text |Cite
|
Sign up to set email alerts
|

Does Maximizing Information at the Cut Score Always Maximize Classification Accuracy and Consistency?

Abstract: A common suggestion made in the psychometric literature for fixed-length classification tests is that one should design tests so that they have maximum information at the cut score. Designing tests in this way is believed to maximize the classification accuracy and consistency of the assessment. This article uses simulated examples to illustrate that one can obtain higher classification accuracy and consistency by designing tests that have maximum test information at locations other than at the cut score. We s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 18 publications
0
8
0
Order By: Relevance
“…The findings of this study are aligned with what one would expect based upon previous empirical work. As Wyse and Babcock () demonstrated, group ability and the location of maximum information of test items have an impact on classification accuracy. It would therefore be expected that classification accuracy would be different for subgroups on an examination, depending upon the location of maximum test information and subgroup means.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…The findings of this study are aligned with what one would expect based upon previous empirical work. As Wyse and Babcock () demonstrated, group ability and the location of maximum information of test items have an impact on classification accuracy. It would therefore be expected that classification accuracy would be different for subgroups on an examination, depending upon the location of maximum test information and subgroup means.…”
Section: Discussionmentioning
confidence: 99%
“…These test lengths were chosen to approximate reliability values of .70, .80, and .90, with .70 indicative of low reliability, .80 indicative of moderate reliability, and .90 indicative of high reliability. Additionally, Wyse and Babcock () found larger differences in classification accuracy when the number of items was at or below 50 items. Reliability was calculated as Rasch person separation reliability (Linacre, ).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Depending on the purpose of the test, a better item might be the one that maximizes the information at another ability level. For example, in a licensure examination, where the aim is to determine whether examinees are above or below a cut score, test developer might want to administer items that have maximum information at the cut score or somewhere close to cut score (Wyse & Babcock, 2016). The exact locations of the examinees on the ability scale might not be the primary purpose of the examination.…”
Section: The Optimum Item and Item Poolmentioning
confidence: 99%