2014
DOI: 10.1002/ets2.12017
|View full text |Cite
|
Sign up to set email alerts
|

Effect of Item Response Theory (IRT) Model Selection on Testlet‐Based Test Equating

Abstract: The local item independence assumption underlying traditional item response theory (IRT) models is often not met for tests composed of testlets. There are 3 major approaches to addressing this issue: (a) ignore the violation and use a dichotomous IRT model (e.g., the 2‐parameter logistic [2PL] model), (b) combine the interdependent items to form a polytomous item and apply a polytomous IRT model (e.g., the graded response model [GRM]), and (c) apply a model that explicitly takes into account the dependence at … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
12
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(14 citation statements)
references
References 24 publications
1
12
0
1
Order By: Relevance
“…In terms of testlet effect, it was seen that both the 2PL model and GRM were more sensitive, whereas the TRT model and GRTM seemed relatively robust as testlet effect increased from low to high. This general pattern has been consistently observed in the previous study with the comparison of different IRT models on testlet-based test equating for the dichotomous items (Cao et al, 2014), but the previous study has not taken the polytomous items into consideration. Concerning the length of testlet items, it was clear, as discussed earlier, that the TRT model and GRTM were more accurate as the length of testlet items increased than were the 2PL model and GRM.…”
Section: Discussionsupporting
confidence: 81%
See 3 more Smart Citations
“…In terms of testlet effect, it was seen that both the 2PL model and GRM were more sensitive, whereas the TRT model and GRTM seemed relatively robust as testlet effect increased from low to high. This general pattern has been consistently observed in the previous study with the comparison of different IRT models on testlet-based test equating for the dichotomous items (Cao et al, 2014), but the previous study has not taken the polytomous items into consideration. Concerning the length of testlet items, it was clear, as discussed earlier, that the TRT model and GRTM were more accurate as the length of testlet items increased than were the 2PL model and GRM.…”
Section: Discussionsupporting
confidence: 81%
“…In the current practice of educational measurement, test equating is a vital step to put scores from different forms onto a same scale. However, in most large-scale testing programs, it is common for a standardized test to consist of testlets ( Bradlow et al, 1999 ; Rijmen, 2009 ; Cao et al, 2014 ; Tao and Cao, 2016 ). A testlet is defined as an aggregation of items which are based on a common stimulus ( Wainer and Kiely, 1987 ; Bradlow et al, 1999 ).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…This finding was not informed by analytical strategies, but instead using a Monte Carlo simulation study (MCSS; also referred to simulation for short). "The [two-parameter logistic IRT] equating method was quite robust to the violation of local item independence " Cao, Lu, and Tao (2014), another finding rooted in MCSS. These are just two examples of the utility of simulation studies.…”
mentioning
confidence: 99%