Differential item functioning (DIF) of test items should be evaluated using practical methods that can produce accurate and useful results. Among a plethora of DIF detection techniques, we introduce the new Residual DIF (RDIF) framework, which stands out for its accessibility without sacrificing efficacy. This framework consists of three item response theory (IRT) residual statistics: RDIFR$RDI{F_R}$, RDIFS$RDI{F_S}$, and RDIFRS$RDI{F_{RS}}$. We conducted a simulation study with a 40‐item test to assess the performance of RDIF in comparison with the Mantel‐Haenszel, logistic regression, and IRT‐based likelihood ratio test methods. Even when analyzing small sample sizes, the results revealed RDIFRS$RDI{F_{RS}}$ to be the most robust DIF detection statistic with strict control of Type I error across all simulated conditions when paired with the purification procedure. Also, RDIFR$RDI{F_R}$ and RDIFS$RDI{F_S}$ proved to be powerful indicators of uniform and nonuniform DIF, respectively. Therefore, RDIFRS$RDI{F_{RS}}$ should serve as the primary flagging criterion, whereas RDIFR$RDI{F_R}$ and RDIFS$RDI{F_S}$ best serve as indicators of DIF type. An empirical DIF study also showed that the RDIF framework could perform satisfactorily with real data from a large‐scale assessment. Overall, the RDIF framework demonstrated its potential as a new standard for IRT‐based DIF detection methodology.
Computerized adaptive testing (CAT) technology is widely used in a variety of licensing and certification examinations administered to health professionals in the United States. Many more countries worldwide are expected to adopt CAT for their national licensing examinations for health professionals due to its reduced test time and more accurate estimation of a test-taker’s performance ability. Continuous improvements to CAT algorithms promote the stability and reliability of the results of such examinations. For this reason, conducting simulation studies is a critically important component of evaluating the design of CAT programs and their implementation. This report introduces the principles of SimulCAT, a software program developed for conducting CAT simulation studies. The key evaluation criteria for CAT simulation studies are explained and some guidelines are offered for practitioners and test developers. A step-by-step tutorial example of a SimulCAT run is also presented. The SimulCAT program supports most of the methods used for the 3 key components of item selection in CAT: the item selection criterion, item exposure control, and content balancing. Methods for determining the test length (fixed or variable) and score estimation algorithms are also covered. The simulation studies presented include output files for the response string, item use, standard error of estimation, Newton-Raphson iteration information, theta estimation, the full response matrix, and the true standard error of estimation. In CAT simulations, one condition cannot be generalized to another; therefore, it is recommended that practitioners perform CAT simulation studies in each stage of CAT development.
Most computerized adaptive testing (CAT) programs do not allow test takers to review and change their responses because it could seriously deteriorate the efficiency of measurement and make tests vulnerable to manipulative test-taking strategies. Several modified testing methods have been developed that provide restricted review options while limiting the trade-off in CAT efficiency. The extent to which these methods provided test takers with options to review test items, however, still was quite limited. This study proposes the item pocket (IP) method, a new testing approach that allows test takers greater flexibility in changing their responses by eliminating restrictions that prevent them from moving across test sections to review their answers. A series of simulations were conducted to evaluate the robustness of the IP method against various manipulative test-taking strategies. Findings and implications of the study suggest that the IP method may be an effective solution for many CAT programs when the IP size and test time limit are properly set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.