Tim Moses scite author profile

In this study, we compared 12 statistical strategies proposed for selecting loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. The major focus was on evaluating the effects of the selection strategies on equating function accuracy. Selection strategies’ influence on the estimation of cumulative test score distributions was also assessed. The results of this simulation study differentiate the selection strategies and define the situations where their use has the most important implications for equating function accuracy. The recommended strategy for estimating test score distributions and for equating is AIC minimization.

show abstract

Principles and Practices of Test Score Equating

Dorans

Moses

Eignor

2010

ETS Research Report Series

View full text Add to dashboard Cite

Score equating is essential for any testing program that continually produces new editions of a test and for which the expectation is that scores from these editions have the same meaning over time. Particularly in testing programs that help make high‐stakes decisions, it is extremely important that test equating be done carefully and accurately. An error in the equating function or score conversion can affect the scores for all examinees, which is both a fairness and a validity concern. Because the reported score is so visible, the credibility of a testing organization hinges on activities associated with producing, equating, and reporting scores. This paper addresses the practical implications of score equating by describing aspects of equating and best practices associated with the equating process.

show abstract

A Comparison of IRT Proficiency Estimation Methods Under Adaptive Multistage Testing

Kim

Moses

Yoo

2015

J Educational Measurement

View full text Add to dashboard Cite

This inquiry is an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). We chose a two‐stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two‐stage MST panels (i.e., forms) by manipulating two assembly conditions in each module, such as difficulty level and module length. For each panel, we investigated the accuracy of examinees’ proficiency levels derived from seven IRT proficiency estimators. The choice of Bayesian (prior) versus non‐Bayesian (no prior) estimators was of more practical significance than the choice of number‐correct versus item‐pattern scoring estimators. The Bayesian estimators were slightly more efficient than the non‐Bayesian estimators, resulting in smaller overall error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for low‐ and high‐performing examinees.

show abstract

Kernel and Traditional Equipercentile Equating With Degrees of Presmoothing

Moses

Holland

2007

ETS Research Report Series

View full text Add to dashboard Cite

The purpose of this study was to empirically evaluate the impact of loglinear presmoothing accuracy on equating bias and variability across chained and post-stratification equating methods, kernel and percentile-rank continuization methods, and sample sizes. The results of evaluating presmoothing on equating accuracy generally agreed with those of previous presmoothing studies, suggesting that less parameterized presmoothing models are more biased and less variable than highly parameterized presmoothing models and raw data. Estimates of standard errors of equating were most accurate when based on large sample sizes and score-level data that were not sparse. The accuracy of standard error estimates was not influenced by the correctness of the presmoothing model. The accuracy of estimates of the standard errors of equating differences was also evaluated. The study concludes with some detailed comparisons of how the kernel and traditional equipercentile continuization methods interacted with data that were presmoothed to different degrees.

show abstract

A Review of Developments and Applications in Item Analysis

Moses

2017

View full text Add to dashboard Cite

This chapter summarizes contributions ETS researchers have made concerning the applications of, refinements to, and developments in item analysis procedures. The focus is on dichotomously scored items, which allows for a simplified presentation that is consistent with the focus of the developments and which has straightforward applications to polytomously scored items. Item analysis procedures refer to a set of statistical measures used by testing experts to review and revise items, to estimate the characteristics of potential test forms, and to make judgments about the quality of items and assembled test forms. These procedures and statistical measures have been alternatively characterized as conventional item analysis (Lord 1961(Lord , 1965a, traditional item analysis (Wainer 1989), analyses associated with classical test theory (Embretson and Reise 2000;Hambleton 1989;Tucker 1987;Yen and Fitzpatrick 2006), and simply item analysis (Gulliksen 1950;Livingston and Dorans 2004). This chapter summarizes key concepts of item analysis described in the sources cited. The first section describes item difficulty and discrimination indices. Subsequent sections review discussions about the relationships of item scores and test scores, visual displays of item analysis, and the additional roles item analysis methods have played in various psychometric contexts. The key concepts described in each section are summarized in Table 2.1.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tim Moses

Selection Strategies for Univariate Loglinear Smoothing Models and Their Effect on Equating Function Accuracy

Principles and Practices of Test Score Equating

A Comparison of IRT Proficiency Estimation Methods Under Adaptive Multistage Testing

Kernel and Traditional Equipercentile Equating With Degrees of Presmoothing

A Review of Developments and Applications in Item Analysis

Contact Info

Product

Resources

About