A test may be unfair when students with the same knowledge but from different demographic groups perform differently on its items. Identifying and addressing this differential item functioning (DIF) helps ensure a fair, unbiased test. This Research Methods paper will help biology education researchers identify DIF items in their assessments.
Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered countries. Specifically, items for which most respondents in a country have a very low (or high) probability of providing a correct answer will rarely be flagged by the RMSD as showing misfit, even if very strong DIF is present. With many international large-scale assessment initiatives moving toward covering a more heterogeneous group of countries, this raises issues for the ability of the RMSD to detect itemlevel misfit, especially in low-performing countries that are not well-aligned with the overall difficulty level of the test. This may put one at risk of incorrectly assuming measurement invariance to hold, and may also inflate estimated between-country difference in proficiency. The degree to which the RMSD is able to detect DIF in low-performing countries is studied using both an empirical example from PISA 2015 and a simulation study.
Participation in international large‐scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered low‐performing systems the option of including an easier set of items in the assessment with an aim of providing improved achievement estimates. However, there remains a lack of evidence on the performance of this design innovation. As such, we simulate a design that closely mirrors the PISA 2015 math assessment in order to empirically examine the benefits of including easy items for low‐performing countries. We extend the PISA design to include increased numbers of easy items and items that are easier than currently implemented. Findings show that the current PISA approach provides little advantage compared to a common test for all participants. Our study also demonstrates persistent bias, low coverage rates, and low correlations between generating and estimated proficiency under current designs. Through our simulation we also show that to improve achievement estimation for low performers about half of the items would need to be made significantly easier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.