2001
DOI: 10.1177/01466210122032226
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Dichotomous and Polytomous Item Response Models in Equating Scores from Tests Composed of Testlets

Abstract: The performance of two polytomous item response theory models was compared to that of the dichotomous three-parameter logistic model in the context of equating tests composed of testlets. For the polytomous models, testlet scores were used to eliminate the effect of the dependence among within-testlet items. Traditional equating methods were used as criteria for both. The equating methods based on polytomous models were found to produce results that more closely agreed with the results of traditional methods.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
27
1
1

Year Published

2007
2007
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(30 citation statements)
references
References 24 publications
1
27
1
1
Order By: Relevance
“…First, among the three IRT‐based equating methods, the raw‐to‐raw conversions produced by the GRM equating methods diverged most from the population conversions produced by the equipercentile method. This finding is not consistent with Lee et al's () finding that the GRM equating method produced results more consistent with those of the equipercentile method than the 3PL method. This inconsistency might be caused by the model selection difference: the 2PL was selected in this study whereas the 3PL was selected in Lee et al's study and LID has been shown to have large impact on the c ‐parameter estimation.…”
Section: Discussioncontrasting
confidence: 87%
See 1 more Smart Citation
“…First, among the three IRT‐based equating methods, the raw‐to‐raw conversions produced by the GRM equating methods diverged most from the population conversions produced by the equipercentile method. This finding is not consistent with Lee et al's () finding that the GRM equating method produced results more consistent with those of the equipercentile method than the 3PL method. This inconsistency might be caused by the model selection difference: the 2PL was selected in this study whereas the 3PL was selected in Lee et al's study and LID has been shown to have large impact on the c ‐parameter estimation.…”
Section: Discussioncontrasting
confidence: 87%
“…Alternatively, the generalized partial credit model is another valid choice for ordered categorical responses. Because neither model consistently exhibits superiority over the other based on the existing literature (Cao, Yin, & Gao, 2007;Lee et al, 2001;Tang & Eignor, 1997), the selection of GRM is arbitrary in this study.The GRM directly models the cumulative category response function. Under the GRM, the probability of examinee j earning a score on item i at or above category k can be expressed as:…”
Section: The Graded Response Model (Grm)mentioning
confidence: 99%
“…It is well known that the Rasch model, and IRT models in general, are not robust with respect to violations of the local item independence assumption. The inclusion of items with local item dependence (LID) may result in contaminated estimates of test reliability, person and item parameters, standard errors and equating coefficients, see, for instance, Yen (1984), Thissen, Steinberg and Mooney (1989), Sireci, Thissen and Wainer (1991), Yen (1993), Wainer and Thissen (1996), Lee, Kolen, Frisbie and Ankenmann (2001) and Tuerlinckx and De Boeck (2001). Next to this, some research has been devoted to the development of tests or indices for the detection of violations of the conditional independence assumption, see van den Wollenberg (1982), Rosenbaum (1984Rosenbaum ( , 1988, Yen (1984), Stout (1987Stout ( , 1990, Stout et al (1996), Chen and Thissen (1997), Douglas, Kim, Habing and Gao (1998) and Ip (2001).…”
Section: Introductionmentioning
confidence: 99%
“…Traditionally, the effects of testlets were ignored and each item constituting the testlet was scored as if they were independent items (Bradlow, Wainer, & Wang, 1999;Lee, Kolen, Frisbie, & Ankenmann, 2001;Sireci et al, 1991;Wainer, 1995;Wainer & Wang, 2000;Yen, 1993). The DIF analyses performed in these studies are considered at item level Wainer & Thissen, 1996).…”
Section: Differential Item Functioning (Dif)mentioning
confidence: 99%