Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well-known models, such as univariate and multivariate versions of the Rasch model and the two-parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL Internet-based testing.
The present investigation has a dual focus: to evaluate problematic practice in the use of item parcels and to suggest exploratory structural equation models (ESEMs) as a viable alternative to the traditional independent clusters confirmatory factor analysis (ICM-CFA) model (with no cross-loadings, subsidiary factors, or correlated uniquenesses). Typically, it is ill-advised to (a) use item parcels when ICM-CFA models do not fit the data, and (b) retain ICM-CFA models when items cross-load on multiple factors. However, the combined use of (a) and (b) is widespread and often provides such misleadingly good fit indexes that applied researchers might believe that misspecification problems are resolved--that 2 wrongs really do make a right. Taking a pragmatist perspective, in 4 studies we demonstrate with responses to the Rosenberg Self-Esteem Inventory (Rosenberg, 1965), Big Five personality factors, and simulated data that even small cross-loadings seriously distort relations among ICM-CFA constructs or even decisions on the number of factors; although obvious in item-level analyses, this is camouflaged by the use of parcels. ESEMs provide a viable alternative to ICM-CFAs and a test for the appropriateness of parcels. The use of parcels with an ICM-CFA model is most justifiable when the fit of both ICM-CFA and ESEM models is acceptable and equally good, and when substantively important interpretations are similar. However, if the ESEM model fits the data better than the ICM-CFA model, then the use of parcels with an ICM-CFA model typically is ill-advised--particularly in studies that are also interested in scale development, latent means, and measurement invariance.
Large-scale educational surveys are low-stakes assessments of educational outcomes conducted using nationally representative samples. In these surveys, students do not receive individual scores, and the outcome of the assessment is inconsequential for respondents. The low-stakes nature of these surveys, as well as variations in average performance across countries and other factors such as different testing traditions, are contributing factors to the amount of omitted responses in these assessments. While underlying reasons for omissions are not completely understood, common practice in international assessments is to employ a deterministic treatment These two model-based approaches were compared on the basis of simulated data and data from about 250,000 students from 30 Organisation for Economic Co-operation and Development (OECD) Member countries participating in an international large-scale assessment.
Probabilistic models with more than one latent variable are designed to report proles of skills or cognitive attributes. Testing programs want to oer additional information beyond what a single test score can provide using these skill proles. Many recent approaches to skill prole models are limited to dichotomous data and have made use of computationally intensive estimation methods like the Markov chain Monte Carlo (MCMC), since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a class of general diagnostic models (GDMs) that can be estimated with customary ML techniques and applies to polytomous response variables as well as to skills with two or more prociency levels. The model and the algorithm for estimating model parameters handles directly missing responses without the need of collapsing categories or recoding the data. Within the class of GDMs, compensatory as well as noncompensatory models may be specied. This report uses one member of this class of diagnostic models, a compensatory diagnostic model that is parameterized similar to the generalized partial credit model (GPCM). Many well-known models, such as uni-and multivariate versions of the Rasch model and the two parameter logistic item response theory (2PL-IRT) model, the GPCM, and the FACETS model, as well as a variety of skill prole models, are special cases of this member of the class of GDMs. This paper describes an algorithm that capitalizes on using tools from item response theory for scale linking, item t, and parameter estimation. In addition to an introduction to the class of GDMs and to the partial credit instance of this class for dichotomous and polytomous skill proles, this paper presents a parameter recovery study using simulated data and an application to real data from the eld test for TOEFL r Internet-based testing (iBT).
The technical complexities and sheer size of international large-scale assessment (LSA) databases often cause hesitation on the part of the applied researcher interested in analyzing them. Further, inappropriate choice or application of statistical methods is a common problem in applied research using these databases. This article serves as a primer for researchers on the issues and methods necessary for obtaining unbiased results from LSA data. The authors outline the issues surrounding the analysis and reporting of LSA data, with a particular focus on three prominent international surveys. In addition, they make recommendations targeted at applied researchers regarding best analysis and reporting practices when using these databases.
This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
Item nonresponse is a common problem in educational and psychological assessments. The probability of unplanned missing responses due to omitted and not-reached items may stochastically depend on unobserved variables such as missing responses or latent variables. In such cases, missingness cannot be ignored and needs to be considered in the model. Specifically, multidimensional IRT models, latent regression models, and multiple-group IRT models have been suggested for handling nonignorable missing responses in latent trait models. However, the suitability of the particular models with respect to omitted and not-reached items has rarely been addressed. Missingness is formalized by response indicators that are modeled jointly with the researcher's target model. We will demonstrate that response indicators have different statistical properties depending on whether the items were omitted or not reached. The implications of these differences are used to derive a joint model for nonignorable missing responses with ability to appropriately account for both omitted and not-reached items. The performance of the model is demonstrated by means of a small simulation study.
In low-stakes assessments, test performance has few or no consequences for examinees themselves, so that examinees may not be fully engaged when answering the items. Instead of engaging in solution behaviour, disengaged examinees might randomly guess or generate no response at all. When ignored, examinee disengagement poses a severe threat to the validity of results obtained from low-stakes assessments. Statistical modelling approaches in educational measurement have been proposed that account for non-response or for guessing, but do not consider both types of disengaged behaviour simultaneously. We bring together research on modelling examinee engagement and research on missing values and present a hierarchical latent response model for identifying and modelling the processes associated with examinee disengagement jointly with the processes associated with engaged responses. To that end, we employ a mixture model that identifies disengagement at the item-by-examinee level by assuming different datagenerating processes underlying item responses and omissions, respectively, as well as response times associated with engaged and disengaged behaviour. By modelling examinee engagement with a latent response framework, the model allows assessing how examinee engagement relates to ability and speed as well as to identify items that are likely to evoke disengaged test-taking behaviour. An illustration of the model by means of an application to real data is presented. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.