Unidimensionality and Vertical Equating With the Rasch Model

Holmes, Susan E.

doi:10.1111/j.1745-3984.1982.tb00123.x

Cited by 29 publications

(28 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The studies reported mixed results. Holmes (1982) and Loyd and Hoover (1980) found vertical equating to be inadequate for the one-parameter model; Forsyth, Saisangjan, and Gilmer (1981), on the other hand, reported satisfactory results for the one-parameter model, although assumptions, including unidimensionality and a common level of discrimination of the items, were not met.…”

Section: Developmental Score Scales and Vertical Equatingmentioning

confidence: 99%

Investigating the Dimensions of Spelling Ability

Notenboom¹,

Reitsma

2003

Educational and Psychological Measurement

View full text Add to dashboard Cite

In this study, the latent structure of a spelling achievement test for elementary school grades was investigated. Factor analyses revealed that for Grades 2 to 6, the test was unidimensional, whereas for Grade 1, two factors were found: a phonological factor and an orthographic factor. Because of the anchor-test design, vertical equatingwas required to fit a two-parameter, unidimensional item response theory model. Concurrent calibration of the parameters, implemented in BILOG–Multiple Groups (MG), was used as the linking procedure. An evaluation of the model showed that invariant parameter estimates and ability scores were obtained. The transition from an alphabetic to an orthographic strategy occurs in the first grade. Learning to spell throughout elementary education mainly consists of learning to apply orthographic knowledge.

show abstract

Section: Developmental Score Scales and Vertical Equatingmentioning

confidence: 99%

Investigating the Dimensions of Spelling Ability

Notenboom¹,

Reitsma

2003

Educational and Psychological Measurement

View full text Add to dashboard Cite

show abstract

Section: An Evaluation Of Three Approximate Item Response Theory Modementioning

confidence: 99%

“…Like Marco, Petersen, and Stewart, he found that the one-parameter logistic model yielded inadequate results for equating tests of unequal difficulty. Other studies (e.g., Slinde & Linn, 1978;Loyd & Hoover~1980;and Holmes, 1982) have also evaluated the adequacy of the Rasch model for score equating~with mixed results.These studies from the IRT research literature support the possible utility of using approximate methods, but also call attention to conditions under which approximate methods might give unsatisfactory results. For a test that has little form-to-form variation and only moderate differences in the ability of the examinees from one administration to another, there is good reason to expect that approximate methods might provide acceptable results at a much lower cost.…”

mentioning

confidence: 99%

An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores1

Marco

Wingersky

Douglass

1985

ETS Research Report Series

View full text Add to dashboard Cite

The primary purpose of this study was to determine the extent to which three item response theory (IRT) models could be used to approximate the threeparameter logistic model in estimating item parameters and in equating test scores. These approximate models were less expensive to apply and in some cases used less data than the full-blown three-parameter model.The approximations to the three-parameter model used in this study were(1) the Rasch one-parameter model, as operationalized in the BICAL computer program, (2) an approximate three-parameter logistic model based on grouped data divided into fifths and twentieths, and (3) a modified three-parameter logistic model with fixed a's and c's. The LOGIST computer program was used tq estimate parameters for the modified three-parameter model; Quantile, a modified version of LOGIST that accepted coarsely grouped data, was used to estimate item parameters for the approximate three-parameter model.In the case of the approximate models involving BleAL and LOGIST, results of separate item calibrations were used to place item parameter estimates on the same scale. In the case of the approximate model involving Quantile, a method of scaling the item parameter estimates indirectly through existing SAT scaled scores was used.The data for the study came from a recent study (Petersen. Cook, & Stocking, 1983) of scale stability for the Scholastic Aptitude Test. As in the previous study, this study involved the chain equating of a test to itself through five intermediary forms. The sample consisted of approximately 2,670 cases for each of the SAT forms used.-11 -The results of the study were as follows: (1) the item calibrations based on twentieths were closer to the true values and to LOGIST estimates than item calibrations based on fifths; (2) the equating results based on twentieths~however~were not more accurate generally than those based on fifths; (3) the three-parameter model using coarse groupings yielded highly accurate score conversions in equating a test to itself, more accurate in fact than the full-blown three-parameter models studied by Petersen, Cook. and Stocking; and (4) all of the approximate models yielded very accurate equating results. A follow-up analysis indicated that these unexpected equating results were due in large part to the indirect method used to place item parameter estimates on scale through existing score conversions derived from conventional equating methods. The success of the approximate models raises a question about the adequacy of equating a test to itself as a criterion for evaluating equating results. Further research is recommended before any of the approximate models are used operationally. An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores 1The increasing internal and external demands made on testing programs have underscored the inflexibility of score equating methods used traditionally.Item response theory (IRT) equating offers several advantages in this context, including improved eq...

show abstract

“…Holmes (1982) has shown how Rasch analysis can be used to vertically equate suites of tests, each of which has been designed to measure a different range of ability. Agrawal (1979) used the technique to recalibrate tests from which items that were deemed not to fit the validity construct had been removed.…”

Section: Theoretical Backgroundmentioning

confidence: 99%

(Un)reliability of equivalent forms of the Taiwanese tour guide English tests

Gilks

2014

Language Testing in Asia

View full text Add to dashboard Cite

The aim of this study was to determine whether large variations in the pass rates of different versions of the Taiwanese tour guides' English test were due to inherent differences in the difficulty of the tests. Rasch analysis software was used to measure the difficulty of all items on the 2009 and 2013 versions. It was found that the ability levels corresponding to passing scores of 48, 49 and 50 (out of 80) on the 2009 test were below that required to pass the 2103 test and that it was therefore easier to pass the 2009 test. This difference was likely to have contributed to the 2009 test's significantly higher pass rate. It is argued that differences of this magnitude are avoidable, and it is recommended that the developers apply Item Response Theory to the preparation of tests in order to more reliably distinguish between those test takers whose English ability is deemed sufficient to carry out the tasks of a professional tour guide from those whose is not.

show abstract

Unidimensionality and Vertical Equating With the Rasch Model

Cited by 29 publications

References 17 publications

Investigating the Dimensions of Spelling Ability

Investigating the Dimensions of Spelling Ability

An Evaluation of Three Approximate Item Response Theory Models for Equating Test Scores1

(Un)reliability of equivalent forms of the Taiwanese tour guide English tests

Contact Info

Product

Resources

About