In empirical use of Mokken scaling, the Crit index is used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was two-fold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. We conducted two simulation studies in the context of dichotomously-scored item responses. False positive rates and power to detect assumption violations were considered. We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Our findings are relevant to all practitioners who use Mokken scaling for scale and questionnaire construction and revision.
Purpose In Mokken scaling, the Crit index was proposed and is sometimes used as evidence (or lack thereof) of violations of some common model assumptions. The main goal of our study was twofold: To make the formulation of the Crit index explicit and accessible, and to investigate its distribution under various measurement conditions. Methods We conducted two simulation studies in the context of dichotomously scored item responses. We manipulated the type of assumption violation, the proportion of violating items, sample size, and quality. False positive rates and power to detect assumption violations were our main outcome variables. Furthermore, we used the Crit coefficient in a Mokken scale analysis to a set of responses to the General Health Questionnaire (GHQ-12), a self-administered questionnaire for assessing current mental health. Results We found that the false positive rates of Crit were close to the nominal rate in most conditions, and that power to detect misfit depended on the sample size, type of violation, and number of assumption-violating items. Overall, in small samples Crit lacked the power to detect misfit, and in larger samples power differed considerably depending on the type of violation and proportion of misfitting items. Furthermore, we also found in our empirical example that even in large samples the Crit index may fail to detect assumption violations. Discussion Even in large samples, the Crit coefficient showed limited usefulness for detecting moderate and severe violations of monotonicity. Our findings are relevant to researchers and practitioners who use Mokken scaling for scale and questionnaire construction and revision.
Objectives In this study, we examined the consequences of ignoring violations of assumptions underlying the use of sum scores in assessing attention problems (AP) and if psychometrically more refined models improve predictions of relevant outcomes in adulthood. Methods Tracking Adolescents' Individual Lives data were used. AP symptom properties were examined using the AP scale of the Child Behavior Checklist at age 11. Consequences of model violations were evaluated in relation to psychopathology, educational attainment, financial status, and ability to form relationships in adulthood. Results Results showed that symptoms differed with respect to information and difficulty. Moreover, evidence of multidimensionality was found, with two groups of items measuring sluggish cognitive tempo and attention deficit hyperactivity disorder symptoms. Item response theory analyses indicated that a bifactor model fitted these data better than other competing models. In terms of accuracy of predicting functional outcomes, sum scores were robust against violations of assumptions in some situations. Nevertheless, AP scores derived from the bifactor model showed some superiority over sum scores. Conclusion These findings show that more accurate predictions of later‐life difficulties can be made if one uses a more suitable psychometric model to assess AP severity in children. This has important implications for research and clinical practice.
Executive SummaryThe aim of this study was twofold: First, we investigated whether scores on an admission test lead to similar predictions in future study success when administered in a proctored-and an unproctored setting. Second, we explored how Bayesian modeling can be of help in interpreting admission-testing data. Results showed that the mode of administration of an admission test did not result in different models for predicting study success and that Bayesian modeling provide a very useful-and easy-to-interpret framework for predicting the probability of future study success. 3Arguably the most important aim of admission testing is the prediction of future academic success. Academic success is typically operationalized as GPA or study progress, but can also include leadership or citizenship (e.g., Stemler, 2012;Sternberg, 2010). In order to accept those students with the highest academic potential, students are admitted to college or graduate programs based on admission criteria such as scores on admission tests and other possible predictors such as high school performance (in the case of undergraduate admissions), undergraduate performance (in the case of graduate school admissions), biodata (such as life and work experience), personal statements, recommendations, and interviews (Clinedinst & Patel, 2018). Since access to higher education programs is an important determinant of later life outcomes, such as income, attitudes, and political behavior (Lemann, 1999, p. 6), it is important that admission procedures consist of fair and valid instruments and procedures.The widespread use of computers further allows for more varied forms of assessment, which makes admission testing even more difficult. Testing at a distance is now more common, although it does raise questions concerning the validity of the test results. Dishonest testing behavior (e.g., cheating) is more difficult to control in unproctored, online, tests. Furthermore, the security of test items is also potentially jeopardized, which may contribute to inflated test scores. Hence, it is crucial to ascertain that test takers who are assessed at a distance (i.e., unproctored) are not advantaged over test takers who are assessed in a proctored environment. In this study we investigate whether proctored and unproctored tests may lead to different test results, and to differences in prediction, which is of major importance in admission testing. If unproctored test-takers engage in cheating, we would expect that their academic performance is overpredicted, that is, they perform less well academically 4 than we would expect based on their admission test scores. We study differential prediction between unproctored and proctored test using real admission test data.Specifically, we compare scores across the two groups by means of the moderated multiple regression model proposed by Lautenschlager and Mendoza (1986), under both the frequentist and the Bayesian paradigm. Our goal is to investigate whether differential prediction of first year GPA exists betwe...
In this chapter, the practical consequences of violations of unidimensionality on selection decisions in the framework of unidimensional item response theory (IRT) models are investigated based on simulated data. The factors manipulated include the severity of violations, the proportion of misfitting items, and test length. The outcomes that were considered are the precision and accuracy of the estimated model parameters, the correlations of estimated ability (θ-hat) and number-correct (NC) scores with the true ability (θ), the ranks of the examinees and the overlap between sets of examinees selected based on either θ, θ-hat, or NC scores, and the bias in criterion-related validity estimates. Results show that the θ-hat values were unbiased by violations of unidimensionality, but their precision decreased as multidimensionality and the proportion of misfitting items increased; the estimated item parameters were robust to violations of un dimensionality. The correlations between θ, θ-hat, and NC scores, the agreement between the three selection criteria, and the accuracy of criterion-related validity estimates are all negatively affected, to some extent, by increasing levels of multidimensionality and the proportion of misfitting items. However, removing the misfitting items only improved the results in the case of severe multidimensionality and large proportion of misfitting items, and deteriorated them otherwise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.