Recently, Shealy and Stout (1993) proposed a DIF detecting procedure SIBTEST, which is 1) IRT model based, 2) non‐parametric, 3) does not require IRF estimation, 4) provides a test of significance, and 5) estimates the amount of DIF. Current versions of SIBTEST can only be used for dichotomously scored items. However, in this paper an extension to handle polytomous items is developed. This paper presents: (1) a discussion of an appropriate definition of DIF for polytomously scored items, (2) a modified SIBTEST procedure for detecting DIF for polytomous items, and (3) the results of two simulation studies comparing the modified SIBTEST with the Mantel and SMD procedures, one study with data constrained by the Rasch‐like partial credit model (same discrimination across polytomous items), and the other study with data having distinctly discrimations across items. These simulation studies indicate that the methodology of including the studied item in matching subtest for controling impact induced (group ability differences existing) Type I error tends to yield Type‐I/Type II error inflation rates that are highly unacceptable when the equal discrimination condition is violated. These simulation studies provide compelling evidence that the modified SIBTEST procedure is much more robust with regard to controlling impact‐induced Type I error rate inflation than the other procedures.
Recently, Shealy and Stout (1993) proposed a DIF detecting procedure SIBTEST, which is 1) IRT model based, 2) non‐parametric, 3) does not require IRF estimation, 4) provides a test of significance, and 5) estimates the amount of DIF. Current versions of SIBTEST can only be used for dichotomously scored items. However, in this paper an extension to handle polytomous items is developed. This paper presents: (1) a discussion of an appropriate definition of DIF for polytomously scored items, (2) a modified SIBTEST procedure for detecting DIF for polytomous items, and (3) the results of two simulation studies comparing the modified SIBTEST with the Mantel and SMD procedures, one study with data constrained by the Rasch‐like partial credit model (same discrimination across polytomous items), and the other study with data having distinctly discrimations across items. These simulation studies indicate that the methodology of including the studied item in matching subtest for controling impact induced (group ability differences existing) Type I error tends to yield Type‐I/Type II error inflation rates that are highly unacceptable when the equal discrimination condition is violated. These simulation studies provide compelling evidence that the modified SIBTEST procedure is much more robust with regard to controlling impact‐induced Type I error rate inflation than the other procedures.
A literature review was conducted to determine the current state of knowledge concerning the effects of the computer administration of standardized educational and psychological tests on the psychometric properties of these instruments. Studies were grouped according to a number of factors relevant to the administration of tests by computer. Based on the studies reviewed, we arrived at the following conclusions: The rate at which test‐takers omit items in an automated test may differ from the rate at which they omit items in a conventional presentation. Scores on tests from automated versions of personality inventories such as the Minnesota Multiphasic Personality Inventory are lower than scores obtained in the conventional testing format. These differences may result in part from differing omit rates, as described above, but some of the differences may be caused by other factors. Scores from automated versions of speed tests are not likely to be comparable with scores from paper‐and‐pencil versions. The presentation of graphics in an automated test may have an effect on score equivalence. Such effects were obtained in studies using the Hidden Figures Test. However, in studies with three Armed Services Vocational Aptitude Battery (ASVAB) tests, effects were not found. Tests containing items based on reading passages can become more difficult when presented on a CRT. This was demonstrated in a single study with the ASVAB tests. The possibility of such asymmetric practice effects may make it wise to avoid conducting equating studies based on single‐group counterbalanced designs.
Background The COVID-19 pandemic profoundly affected food systems including food security. Understanding how the COVID-19 pandemic impacted food security is important to provide support, and identify long-term impacts and needs. Objective The National Food Access and COVID research Team (NFACT) was formed to assess food security over different U.S. study sites throughout the pandemic, using common instruments and measurements. This study present results from 18 study sites across 15 states and nationally over the first year of the COVID-19 pandemic. Methods A validated survey instrument was developed and implemented in whole or part through an online survey of adults across the sites throughout the first year of the pandemic, representing 22 separate surveys. Sampling methods for each study site were convenience, representative, or high-risk targeted. Food security was measured using the USDA six-item module. Food security prevalence was analyzed using analysis of variance by sampling method to statistically significant differences. Results Respondents (n = 27,168) indicate higher prevalence of food insecurity (low or very low food security) since the COVID-19 pandemic, as compared to before the pandemic. In nearly all study sites, there is higher prevalence of food insecurity among Black, Indigenous, and People of Color (BIPOC), households with children, and those with job disruptions. The findings demonstrate lingering food insecurity, with high prevalence over time in sites with repeat cross-sectional surveys. There are no statistically significant differences between convenience and representative surveys, but statistically higher prevalence of food insecurity among high-risk compared to convenience surveys. Conclusions This comprehensive study demonstrates higher prevalence of food insecurity in the first year of the COVID-19 pandemic. These impacts were prevalent for certain demographic groups, and most pronounced for surveys targeting high-risk populations. Results especially document the continued high levels of food insecurity, as well as the variability in estimates due to survey implementation method. Summary Multi-site assessment demonstrates widespread food insecurity during COVID-19, especially on households with children, job loss, and Black, Indigenous, People of Color across multiple survey methods.
item response theory, polytomous item, partial credit model, generalized partial credit model, graded response model, invariance, ordered categories,
The purpose of this project was to evaluate statistical procedures for assessing differential item functioning (DIF) in polytomous items (items with more than two score categories). Three descriptive statistics—the Standardized Mean Difference, or SMD (Dorans & Schmitt, 1991), and two procedures based on SIBTEST (Shealy & Stout, 1993) were considered, along with five inferential procedures—two based on SMD, two based on SIBTEST, and the Mantel (1963) method. The DIF procedures were evaluated through applications to simulated data, as well as data from ETS tests. The simulation included conditions in which the two groups of examinees had the same ability distribution and conditions in which the group means differed by one standard deviation. When the two groups had the same distribution, the descriptive index that performed best was the SMD. When the two groups had different distributions, a modified form of the SIBTEST DIF effect size measure tended to perform best. The five inferential procedures performed almost indistinguishably when the two groups had identical distributions. When the two groups had different distributions and the studied item was highly discriminating, the SIBTEST procedures showed much better Type I error control than did the SMD and Mantel methods, particularly in short tests. The power ranking of the five procedures was inconsistent; it depended on the direction of DIF and other factors. Routine application of these polytomous DIF methods at ETS seems feasible in cases where a reliable test is available for matching examinees. For the Mantel and SMD methods, Type I error control may be a concern under certain conditions. In the case of SIBTEST, the current version cannot easily accommodate matching tests that do not use number‐right scoring. Additional research in these areas is likely to be useful.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.