Bradley A. Hanson scite author profile

Item response theory item parameters can be estimated using data from a common-item equating design either separately for each form or concurrently across forms. This paper reports the results of a simulation study of separate versus concurrent item parameter estimation. Using simulated data from a test with 60 dichotomous items, four factors were considered: (a) estimation program (MULTILOG versus BILOG-MG), (b) sample size per form (3,000 versus 1,000), (c) number of common items (20 versus 10), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 1 SD). In addition, four methods of item parameter scaling were used in the separate estimation condition: two item characteristic curve methods (Stocking-Lord and Haebara) and two moment methods (Mean/Mean and Mean/Sigma). Concurrent estimation generally resulted in lower error than separate estimation, although not universally so. The results suggest that one factor accounting for the lower error when using concurrent estimation may be that the parameter estimates for the common item parameters are based on larger samples. It is argued that the results of this study, together with other research on this topic, are not sufficient to recommend completely avoiding separate estimation in favor of concurrent estimation.

show abstract

ACT Research Report Series: A Comparison of Presmoothing and Postsmoothing Methods in Equipercentile Equating

Hanson¹,

Zeng²,

Colton³

1994

View full text Add to dashboard Cite

Development and Calibration of an Item Response Model That Incorporates Response Time

Wang

Hanson²

2005

Applied Psychological Measurement

View full text Add to dashboard Cite

This article proposes an item response model that incorporates response time. A parameter estimation procedure using the EM algorithm is developed. The procedure is evaluated with both real and simulated test data. The results suggest that the estimation procedure works well in estimating model parameters. By using response time data, estimation of person ability parameters can be improved. Potential applications of this model are discussed. Directions for further study are suggested.

show abstract

A Comparative Study of On‐line Pretest Item—Calibration/Scaling Methods in Computerized Adaptive Testing

Ban¹,

Hanson²,

Wang³

et al. 2001

J Educational Measurement

View full text Add to dashboard Cite

The purpose of this study was to compare and evaluate five on‐line pretest item‐calibration/scaling methods in computerized adaptive testing (CAT): marginal maximum likelihood estimate with one EM cycle (OEM), marginal maximum likelihood estimate with multiple EM cycles (MEM), Stocking's Method A, Stocking's Method B, and BILOG/Prior. The five methods were evaluated in terms of item‐parameter recovery, using three different sample sizes (300, 1000 and 3000). The MEM method appeared to be the best choice among these, because it produced the smallest parameter‐estimation errors for all sample size conditions. MEM and OEM are mathematically similar, although the OEM method produced larger errors. MEM also was preferable to OEM, unless the amount of time involved in iterative computation is a concern. Stocking's Method B also worked very well, but it required anchor items that either would increase test lengths or require larger sample sizes depending on test administration design. Until more appropriate ways of handling sparse data are devised, the BILOG/Prior method may not be a reasonable choice for small sample sizes. Stocking's Method A had the largest weighted total error, as well as a theoretical weakness (i.e., treating estimated ability as true ability); thus, there appeared to be little reason to use it.

show abstract

Estimating Consistency and Accuracy Indices for Multiple Classifications

Lee¹,

Hanson²,

Brennan

2002

Applied Psychological Measurement

View full text Add to dashboard Cite

This article describes procedures for estimating various indices of classification consistency and accuracy for multiple category classifications using data from a single test administration. The estimates of the classification consistency and accuracy indices are compared under three different psychometric models: the two-parameter beta binomial, four-parameter beta binomial, and three-parameter logistic IRT (item response theory) models. Using real data sets, the estimation procedures are illustrated, and the characteristics of the estimated classification indices are examined. This article also examines the behavior of the estimated classification indices as a function of the latent variable. All three components of the models (i.e., the estimated true score distributions, fitted observed score distributions, and estimated conditional error variances) appear to have considerable influence on the magnitudes of the estimated classification indices. Choosing a model in practice should be based on various considerations including the degree of model fit to the data, suitability of the model assumptions, and the computational feasibility.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bradley A. Hanson

Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent Estimation in the Common-Item Equating Design

ACT Research Report Series: A Comparison of Presmoothing and Postsmoothing Methods in Equipercentile Equating

Development and Calibration of an Item Response Model That Incorporates Response Time

A Comparative Study of On‐line Pretest Item—Calibration/Scaling Methods in Computerized Adaptive Testing

Estimating Consistency and Accuracy Indices for Multiple Classifications

Contact Info

Product

Resources

About