Through simulations, this study investigates the effects of anchor item methods on Type I error and power of detecting differential item functioning (DIF) using the likelihood ratio test within the framework of item response theory. Four anchor item methods were compared: the all-other, 1-item, 4-item, and 10-item methods. The results showed that it is the average signed area between the reference and focal groups rather than the percentage of DIF items in a test that determines the Type I error of the all-other method. The all-other method yields good control over Type I error and reasonable power only when the average signed area approaches zero. The all-other method is not recommended for practical DIF analysis because it is only adequate under very stringent conditions. The other three methods perform appropriately under all the simulated conditions. The more anchor items are used, the higher the power of DIF detection.
The Rasch testlet model for both dichotomous and polytomous items in testlet-based tests is proposed. It can be viewed as a special case of the multidimensional random coefficients multinomial logit model (MRCMLM). Therefore, the estimation procedures for the MRCMLM can be directly applied. Simulations were conducted to examine parameter recovery under the dichotomous Rasch testlet model and the partial-credit testlet model. Results indicated that the item and person parameters as well as the random testlet effects could be recovered very accurately under all the simulated conditions. As sample sizes were increased, the root mean square errors of the estimates decreased to an acceptable level. An empirical example of an English test with 11 testlets was given. Index terms: multidimensional item response model, item bundle, marginal maximum likelihood estimation, parameter recovery.
Extreme response style (ERS) is a systematic tendency for a person to endorse extreme options (e.g., strongly disagree, strongly agree) on Likert-type or rating-scale items. In this study, we develop a new class of item response theory (IRT) models to account for ERS so that the target latent trait is free from the response style and the tendency of ERS is quantified. Parameters of these new models can be estimated with marginal maximum likelihood estimation methods or Bayesian methods. In this study, we use the freeware program WinBUGS, which implements Bayesian methods. In a series of simulations, we find that the parameters are recovered fairly well; ignoring ERS by fitting standard IRT models resulted in biased estimates, and fitting the new models to data without ERS did little harm. Two empirical examples are provided to illustrate the implications and applications of the new models.
A conventional way to analyze item responses in multiple tests is to apply unidimensional item response models separately, one test at a time. This unidimensional approach, which ignores the correlations between latent traits, yields imprecise measures when tests are short. To resolve this problem, one can use multidimensional item response models that use correlations between latent traits to improve measurement precision of individual latent traits. The improvements are demonstrated using 2 empirical examples. It appears that the multidimensional approach improves measurement precision substantially, especially when tests are short and the number of tests is large. To achieve the same measurement precision, the multidimensional approach needs less than half of the comparable items required for the unidimensional approach.
The multiple indicators, multiple causes (MIMIC) method with a pure short anchor was proposed to detect differential item functioning (DIF). A simulation study showed that the MIMIC method with an anchor of 1, 2, 4, or 10 DIF-free items yielded a well-controlled Type I error rate even when such tests contained as many as 40% DIF items. In general, a longer anchor increased the power of DIF detection, and a 4-item anchor was long enough to yield a high power of DIF detection. An iterative MIMIC procedure was proposed to locate a set of DIF-free items to function as a pure anchor so that the MIMIC method could proceed properly. In another simulation study, it was found that this iterative procedure yielded a perfect (or nearly perfect) rate of accuracy in locating a set of up to 4 DIF-free items.
The DIF-free-then-DIF (DFTD) strategy consists of two steps: (a) select a set of items that are the most likely to be DIF-free and (b) assess the other items for DIF (differential item functioning) using the designated items as anchors. The rank-based method together with the computer software IRTLRDIF can select a set of DIF-free polytomous items very accurately, but it loses accuracy when tests contain many DIF items. To resolve this problem, the authors developed a new method by adding a scale purification procedure to the rank-based method and conducted two simulation studies to evaluate its performances on DIF assessment. It was found that the new method outperformed the rank-based method in identifying DIF-free items, especially when the tests contained many DIF items. In addition, the new method, combined with the DFTD strategy, yielded a well-controlled Type I error rate and a high power rate of DIF detection. In contrast, conventional DIF assessment methods yielded an inflated Type I error rate and a deflated power rate when the tests contained many DIF items favoring the same group. In conclusion, the simulation results support the new method and the DFTD strategy in DIF assessment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.