Internal measures of differential functioning of items and tests (DHFIT) based on item response theory (IRT) are proposed. Within the DFIT context, the new differential test functioning (DTF) index leads to two new measures of differential item functioning (DIF) with the following properties: (1) The compensatory DIF (CDIF) indexes for all items in a test sum to the DTF index for that test and, unlike current DIF procedures, the CDIF index for an item does not assume that the other items in the test are unbi ased ; (2) the noncompensatory DIF (NCDIF) index, which assumes that the other items in the test are unbiased, is comparable to some of the IRT-based DIP indexes; and (3) COIF and NCDIF, as well as DTF, are equally valid for polytomous and multidimensional IRT models. Monte carlo study results, comparing these indexes with Lord's χ2 test, the signed area measure, and the unsigned area measure, demonstrate that the DFIT framework is accu rate in assessing DTF, COIF, and NCDIF.
In multiple regression, optimal linear weights are obtained using an ordinary least squares (OLS) procedure. However, these linear weighted combinations of predictors may not optimally predict the same criterion in the population from which the sample was drawn (population validity) or other samples drawn from the same population (population cross-validity). To achieve more accurate estimates of population validity and population cross-validity, some researchers and practitioners use formulas or traditional empirical methods to obtain the estimates. Others have suggested using the equal weights procedure as an alternative to the formula-based and empirical procedures. This review found that formula-based procedures can be used in place of empirical validation for estimating population validity or in place of empirical cross-validation for estimating population cross-validity. The equal weights procedure is a viable alternative when the observed multiple correlation is low to moderate and the variability among predictor-criterion correlations is low. Despite these findings, it is difficult to recommend one formula-based estimate over another because no single study has included all of the currently available formulas. Suggestions are offered for future research and application of these techniques.
An empirical monte carlo study was performed using predictor and criterion data from 84,808 U.S. Air Force enlistees. 501 samples were drawn for each of seven sample size conditions: 25, 40, 60, 80, 100, 150, and 200. Using an eight-predictor model, 500 estimates for each of 9 validity and 11 cross-validity estimation procedures were generated for each sample size condition. These estimates were then compared to the actual squared population validity and cross-validity in terms of mean bias and mean squared bias. For the regression models determined using ordinary least squares, the Ezekiel procedure produced the most accurate estimates of squared population validity (followed by the Smith and the Wherry procedures), and Burket’s formula resulted in the best estimates of squared population cross-validity. Other analyses compared the coefficients determined by traditional empirical cross-validation and equal weights; equal weights resulted in no loss of predictive accuracy and less shrinkage. Numerous issues for future basic research on validation and cross-validation are identified.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.