Comparison of Dichotomous and Polytomous Item Response Models in Equating Scores from Tests Composed of Testlets

Lee, Guemin; Kolen, Michael J.; Frisbie, David A.; Ankenmann, Robert D.

doi:10.1177/01466210122032226

Cited by 30 publications

(30 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…First, among the three IRT‐based equating methods, the raw‐to‐raw conversions produced by the GRM equating methods diverged most from the population conversions produced by the equipercentile method. This finding is not consistent with Lee et al's () finding that the GRM equating method produced results more consistent with those of the equipercentile method than the 3PL method. This inconsistency might be caused by the model selection difference: the 2PL was selected in this study whereas the 3PL was selected in Lee et al's study and LID has been shown to have large impact on the c ‐parameter estimation.…”

Section: Discussioncontrasting

confidence: 87%

“…Alternatively, the generalized partial credit model is another valid choice for ordered categorical responses. Because neither model consistently exhibits superiority over the other based on the existing literature (Cao, Yin, & Gao, 2007;Lee et al, 2001;Tang & Eignor, 1997), the selection of GRM is arbitrary in this study.The GRM directly models the cumulative category response function. Under the GRM, the probability of examinee j earning a score on item i at or above category k can be expressed as:…”

Section: The Graded Response Model (Grm)mentioning

confidence: 99%

See 1 more Smart Citation

Effect of Item Response Theory (IRT) Model Selection on Testlet‐Based Test Equating

Cao

Tao

2014

ETS Research Report Series

View full text Add to dashboard Cite

The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2‐parameter logistic [2PL] model), (b) combine the interdependent items to form a polytomous item and apply a polytomous IRT model (e.g., the graded response model [GRM]), and (c) apply a model that explicitly takes into account the dependence at the item level (e.g., the testlet response theory [TRT] model). In this study, a simulation was conducted to compare the performance of these 3 approaches on number‐correct score equating when degrees of testlet effect were manipulated. The traditional equipercentile method was used as an evaluation baseline. The results show that the 2PL and the TRT approaches produce comparable results that more closely agree with the results of the equipercentile method than the GRM does. And the number‐correct equating using the 2PL is robust to the violation of local item independence.

show abstract

Section: Discussioncontrasting

confidence: 87%

Section: The Graded Response Model (Grm)mentioning

confidence: 99%

Effect of Item Response Theory (IRT) Model Selection on Testlet‐Based Test Equating

Cao

Tao

2014

ETS Research Report Series

View full text Add to dashboard Cite

show abstract

“…It is well known that the Rasch model, and IRT models in general, are not robust with respect to violations of the local item independence assumption. The inclusion of items with local item dependence (LID) may result in contaminated estimates of test reliability, person and item parameters, standard errors and equating coefficients, see, for instance, Yen (1984), Thissen, Steinberg and Mooney (1989), Sireci, Thissen and Wainer (1991), Yen (1993), Wainer and Thissen (1996), Lee, Kolen, Frisbie and Ankenmann (2001) and Tuerlinckx and De Boeck (2001). Next to this, some research has been devoted to the development of tests or indices for the detection of violations of the conditional independence assumption, see van den Wollenberg (1982), Rosenbaum (1984Rosenbaum ( , 1988, Yen (1984), Stout (1987Stout ( , 1990, Stout et al (1996), Chen and Thissen (1997), Douglas, Kim, Habing and Gao (1998) and Ip (2001).…”

Section: Introductionmentioning

confidence: 99%

A Speeded Item Response Model with Gradual Process Change

et al. 2007

View full text Add to dashboard Cite

show abstract

“…Traditionally, the effects of testlets were ignored and each item constituting the testlet was scored as if they were independent items (Bradlow, Wainer, & Wang, 1999;Lee, Kolen, Frisbie, & Ankenmann, 2001;Sireci et al, 1991;Wainer, 1995;Wainer & Wang, 2000;Yen, 1993). The DIF analyses performed in these studies are considered at item level Wainer & Thissen, 1996).…”

Section: Differential Item Functioning (Dif)mentioning

confidence: 99%

The Effects of Testlets on Reliability and Differential Item Functioning

2015

EDUC SCI-THEOR PRACT

View full text Add to dashboard Cite

Reliability and differential item functioning (DIF) analyses were conducted on testlets displaying local item dependence in this study. The data set employed in the research was obtained from the answers given by 1500 students to the 20 items included in six testlets given in English Proficiency Exam by the School of Foreign Languages of a state University in Turkey. One of the purposes of this study was to determine the influences of the tests composed of testlets on reliability, so the reliability coefficients obtained for cases where the influences of testlets were considered and those for cases where the testlet influences were not considered were compared. In consequence of the G theory analyses conducted in this context, it was found that the G and Phi coefficients estimated by not considering the testlet effects were higher than those estimated by considering the testlet effects. It was concluded that the reliability was estimated to be relatively higher when the influences of the testlet were not considered. Two methods were used in this study so as to determine the effects of testlets on differential item functioning and the results were compared. In the DIF-determining method considering the testlet effect, both the number of items displaying DIF at the significant and estimated levels of DIF were found to be higher than in the method not considering the testlet effect.

show abstract

Comparison of Dichotomous and Polytomous Item Response Models in Equating Scores from Tests Composed of Testlets

Cited by 30 publications

References 24 publications

Effect of Item Response Theory (IRT) Model Selection on Testlet‐Based Test Equating

Effect of Item Response Theory (IRT) Model Selection on Testlet‐Based Test Equating

A Speeded Item Response Model with Gradual Process Change

The Effects of Testlets on Reliability and Differential Item Functioning

Contact Info

Product

Resources

About