Several approaches to reporting subscale scores can be found in the literature. This research explores a multidimensional compensatory dichotomous and polytomous item response theory modeling approach for subscale score proficiency estimation, leading toward a more diagnostic solution. It also develops and explores the recovery of a Markov chain Monte Carlo (MCMC) estimation approach to multidimensional item and ability parameter estimation, as well as subscale proficiency and classification rates. The simulation study presented here used real data-derived parameters from a large-scale statewide assessment with subscale score information under varying conditions of sample size and correlations between subscales (.0, .1, .3, .5, .7, .9). It was found that to report accurate diagnostic information at the subscale level, the subscales need to be highly correlated, or a multidimensional approach should be implemented. MCMC methodology is still a nascent methodology in psychometrics; however, with the growing body of research, its future looks promising.
What is differential bundle functioning and how is this different from differential item functioning? Can test specifications be used to identify and aid in the interpretation of differential bundle functioning? How can differential bundle functioning lead to an improved understanding of why groups perform differently on achievement tests?
Progress has been made in developing statistical methods for identifying DIF items, but procedures to aid with the substantive interpretations of these items have lagged behind. To overcome this problem, Roussos and Stout (1996) proposed a multidimensionality‐based DIF analysis paradigm. We illustrate and evaluate an application of this framework as it applied to the study of gender differences in mathematics. Four characteristics distinguish this study from previous research: the substantive analysis was guided by past research on the content and cognitive‐related sources of gender differences in mathematics achievement, as presented in the taxonomy by Gallagher, De Lisi, Holst, McGillicuddy‐De Lisi, Morely, and Cahalan (2000); the substantive analysis was conducted by reviewers who were highly knowledgeable about the cognitive strategies students use to solve math problems; three statistical methods were used to test hypotheses about gender differences, including SIBTEST, DIMTEST, and multiple linear regression; and the data were from a curriculum‐based achievement test developed with the goal of minimizing obvious, content‐related gender differences. We show that the framework can lead to clearly interpretable results and we highlight both the strengths and weaknesses of applying the Roussos and Stout framework to the study of group differences.
Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee’s performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose abilities are quite different from the average difficulty level of the item bank when there are only a limited number of items in the item bank. Therefore, instead of stopping the test with a predetermined fixed test length, the authors use a more informative stopping criterion that is directly related to measurement accuracy. Specifically, this research derives four stopping rules that either quantify the measurement precision of the ability vector (i.e., minimum determinant rule [D-rule], minimum eigenvalue rule [E-rule], and maximum trace rule [T-rule]) or quantify the amount of available information carried by each item (i.e., maximum Kullback–Leibler divergence rule [K-rule]). The simulation results showed that all four stopping rules successfully terminated the test when the mean squared error of ability estimation is within a desired range, regardless of examinees’ true abilities. It was found that when using the D-, E-, or T-rule, examinees with extreme abilities tended to have tests that were twice as long as the tests received by examinees with moderate abilities. However, the test length difference with K-rule is not very dramatic, indicating that K-rule may not be very sensitive to measurement precision. In all cases, the cutoff value for each stopping rule needs to be adjusted on a case-by-case basis to find an optimal solution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.