Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study explored the effectiveness of the Wald test in detecting both uniform and nonuniform DIF in the DINA model through a simulation study. Results of this study suggest that for relatively discriminating items, the Wald test had Type I error rates close to the nominal level. Moreover, its viability was underscored by the medium to high power rates for most investigated DIF types when DIF size was large. Furthermore, the performance of the Wald test in detecting uniform DIF was compared to that of the traditional Mantel-Haenszel (MH) and SIBTEST procedures. The results of the comparison study showed that the Wald test was comparable to or outperformed the MH and SIBTEST procedures. Finally, the strengths and limitations of the proposed method and suggestions for future studies are discussed.
98
To examine whether children (mean age 34 months) can fast map and extend novel action labels to actions for which they do not already have names, the comprehension of familiar and novel verbs was tested using colored drawings of Sesame Street characters performing both familiar and unfamiliar actions. Children were asked to point to the character "verbing," from among sets of 4 drawings. With familiar words and actions, children made correct choices 97% of the time. With novel action words, children performed at levels mostly significantly above chance, selecting a previously unlabeled action or another token of a just-names action. In a second, control experiment children were asked to select an action from among the same sets of 4 drawings, but they were not given a novel action name. Here children mainly demonstrated performance at levels not significantly different from chance, showing that the results from the main experiment were attributable to the presence of a word in the request. Results of these studies are interpreted as support for the availability of principles to ease verb acquisition.
This article addresses testing the hypothesis of one versus more than one dominant (essential) dimension in the possible presence of minor dimensions. The method used is Stout's statistical test of essential unidimensionality, which is based on the theory of essential unidimensionality. Differences between the traditional definition of dimensionality provided by item response theory, which counts all dimensions present, and essential dimensionality, which counts only dominant dimensions, are discussed. As Monte Carlo studies demonstrate, Stout's test of essential unidimensionality tends to indicate essential unidimensionality in the presence of one dominant dimension and one or more minor dimensions that have a relatively small influence on item scores. As the influence of the minor dimensions increases, Stout's test is more likely to reject the hypothesis of essential unidimensionality. To assist in interpreting these studies, a rough index of the deviation from essential unidimensionality is proposed.
The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE R Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to which rater-related sources of error could be eliminated through statistical adjustment. Particular attention was given to the stability of these estimated parameters over time. The results suggest that rater stringency estimates obtained at a point in time and then used to adjust ratings over a period of months may substantially decrease in usefulness. In some cases, over several months, the use of these adjustments may become counterproductive. Additionally, it is hypothesized that the rate of deterioration in the usefulness of estimated parameters may be a function of the characteristics of the scale.
This study investigated the impact of three prior distributions: matched, standard vague, and hierarchical in Bayesian estimation parameter recovery in two and one parameter models. Two Bayesian estimation methods were utilized: Markov chain Monte Carlo (MCMC) and the relatively new, Variational Bayesian (VB). Conditional (CML) and Marginal Maximum Likelihood (MML) estimates were used as baseline methods for comparison. Vague priors produced large errors or convergence issues and are not recommended. For both MCMC and VB, the hierarchical and matched priors showed the lowest root mean squared errors (RMSEs) for ability estimates; RMSEs of difficulty estimates were similar across estimation methods. For the standard errors (SEs), MCMC-hierarchical displayed the largest values across most conditions. SEs from the VB estimation were among the lowest in all but one case. Overall, VB-hierarchical, VB-matched, and MCMC-matched performed best. VB with hierarchical priors are recommended in terms of their accuracy, and cost and (subsequently) time effectiveness.
Key words: unidimensionality, essential independence, essential unidimensionality, DIMTEST This article provides a detailed investigation of Stout's statistical procedure (the computer program DIMTEST) for testing the hypothesis that an essentially unidimensional latent trait model fits observed binary item response data from a psychological test. One finding was that DIMTEST may fail to perform as desired in the presence of guessing when coupled with many high-discriminating items. A revision of DIMTEST is proposed to overcome this limitation. Also, an automatic approach is devised to determine the size of the assessment subtests. Further, an adjustment is made on the estimated standard error of the statistic on which DIMTEST depends. These three refinements have led to an improved procedure that is shown in simulation studies to adhere closely to the nominal level of signficance while achieving considerably greater power. Finally, DIMTEST is validated on a selection of real data sets.Item response theory (IRT) is presently one of the most widely used techniques in psychometrics and is likely to remain so in the future. Some applications of IRT include ability estimation, item/test bias, equating, and adaptive testing. The three assumptions underlying many commonly used IRT models are monotonicity, unidimensionality (d = 1), and local independence (LI). Monotonicity assumes that the probability of correctly answering an item increases as ability increases. Unidimensionality 1 assumes that
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.