The primary purpose of this study was to determine the extent to which three item response theory (IRT) models could be used to approximate the threeparameter logistic model in estimating item parameters and in equating test scores. These approximate models were less expensive to apply and in some cases used less data than the full-blown three-parameter model.The approximations to the three-parameter model used in this study were(1) the Rasch one-parameter model, as operationalized in the BICAL computer program, (2) an approximate three-parameter logistic model based on grouped data divided into fifths and twentieths, and (3) a modified three-parameter logistic model with fixed a's and c's. The LOGIST computer program was used tq estimate parameters for the modified three-parameter model; Quantile, a modified version of LOGIST that accepted coarsely grouped data, was used to estimate item parameters for the approximate three-parameter model.In the case of the approximate models involving BleAL and LOGIST, results of separate item calibrations were used to place item parameter estimates on the same scale. In the case of the approximate model involving Quantile, a method of scaling the item parameter estimates indirectly through existing SAT scaled scores was used.The data for the study came from a recent study (Petersen. Cook, & Stocking, 1983) of scale stability for the Scholastic Aptitude Test. As in the previous study, this study involved the chain equating of a test to itself through five intermediary forms. The sample consisted of approximately 2,670 cases for each of the SAT forms used.-11 -The results of the study were as follows: (1) the item calibrations based on twentieths were closer to the true values and to LOGIST estimates than item calibrations based on fifths; (2) the equating results based on twentieths~however~were not more accurate generally than those based on fifths; (3) the three-parameter model using coarse groupings yielded highly accurate score conversions in equating a test to itself, more accurate in fact than the full-blown three-parameter models studied by Petersen, Cook. and Stocking; and (4) all of the approximate models yielded very accurate equating results. A follow-up analysis indicated that these unexpected equating results were due in large part to the indirect method used to place item parameter estimates on scale through existing score conversions derived from conventional equating methods. The success of the approximate models raises a question about the adequacy of equating a test to itself as a criterion for evaluating equating results. Further research is recommended before any of the approximate models are used operationally.
An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores 1The increasing internal and external demands made on testing programs have underscored the inflexibility of score equating methods used traditionally.Item response theory (IRT) equating offers several advantages in this context, including improved eq...