2011
DOI: 10.1177/0013164410375111
|View full text |Cite
|
Sign up to set email alerts
|

The Long-Term Sustainability of Different Item Response Theory Scaling Methods

Abstract: This article investigates the accuracy of examinee classification into performance categories and the estimation of the theta parameter for several item response theory (IRT) scaling techniques when applied to six administrations of a test. Previous research has investigated only two administrations; however, many testing programs equate tests across multiple administrations. As such, this article seeks to examine the long-term sustainability of IRT scaling methods. Three different types of shifts in the abili… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
22
2
1

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 23 publications
(27 citation statements)
references
References 15 publications
2
22
2
1
Order By: Relevance
“…SL produced slightly greater passing misclassifications than MS when there was a moderate or sizable amount of ability shift. To a great extent, our findings are not in opposition to the results of related studies (Pang et al, 2010; Keller and Keller, 2011; Keller and Hambleton, 2013). …”
Section: Discussioncontrasting
confidence: 62%
See 3 more Smart Citations
“…SL produced slightly greater passing misclassifications than MS when there was a moderate or sizable amount of ability shift. To a great extent, our findings are not in opposition to the results of related studies (Pang et al, 2010; Keller and Keller, 2011; Keller and Hambleton, 2013). …”
Section: Discussioncontrasting
confidence: 62%
“…As noted in previous equating studies (Keller and Keller, 2011; Keller and Hambleton, 2013; Kolen and Brennan, 2014), model fit is a strong assumption that IRT equating is based on. Only when the fit between the model and the empirical data of interest is satisfactory, can the IRT equating be appropriately applied.…”
Section: Introductionmentioning
confidence: 86%
See 2 more Smart Citations
“…In particular, the Stocking and Lord Test Characteristic Curve method (SL;Stocking & Lord, 1983) has been shown to exhibit positive bias when there is a positive change in the ability distribution (e.g., Baldwin, Nering, & Baldwin, 2007), while the results from concurrent calibration have been mixed (e.g., Hanson & Béguin, 2002;Kim & Cohen, 1998). Fixed common item parameter (FCIP) scaling has been shown to produce minimal bias when two forms are scaled using this method, but does show some increasing bias as the number of scalings increases (Keller & Keller, 2011).…”
mentioning
confidence: 99%