Yuan‐Ling Liaw scite author profile

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered countries. Specifically, items for which most respondents in a country have a very low (or high) probability of providing a correct answer will rarely be flagged by the RMSD as showing misfit, even if very strong DIF is present. With many international large-scale assessment initiatives moving toward covering a more heterogeneous group of countries, this raises issues for the ability of the RMSD to detect itemlevel misfit, especially in low-performing countries that are not well-aligned with the overall difficulty level of the test. This may put one at risk of incorrectly assuming measurement invariance to hold, and may also inflate estimated between-country difference in proficiency. The degree to which the RMSD is able to detect DIF in low-performing countries is studied using both an empirical example from PISA 2015 and a simulation study.

show abstract

Collapsing Categorical Variables and Measurement Invariance

Rutkowski

Svetina

Liaw

2019

Structural Equation Modeling: A Multidisciplinary Journal

View full text Add to dashboard Cite

Measuring Widening Proficiency Differences in International Assessments: Are Current Approaches Enough?

Rutkowski

Liaw

2018

Educational Measurement

View full text Add to dashboard Cite

Participation in international large‐scale assessments has grown over time with the largest, the Programme for International Student Assessment (PISA), including more than 70 education systems that are economically and educationally diverse. To help accommodate for large achievement differences among participants, in 2009 PISA offered low‐performing systems the option of including an easier set of items in the assessment with an aim of providing improved achievement estimates. However, there remains a lack of evidence on the performance of this design innovation. As such, we simulate a design that closely mirrors the PISA 2015 math assessment in order to empirically examine the benefits of including easy items for low‐performing countries. We extend the PISA design to include increased numbers of easy items and items that are easier than currently implemented. Findings show that the current PISA approach provides little advantage compared to a common test for all participants. Our study also demonstrates persistent bias, low coverage rates, and low correlations between generating and estimated proficiency under current designs. Through our simulation we also show that to improve achievement estimation for low performers about half of the items would need to be made significantly easier.

show abstract

The existence and impact of floor effects for low-performing PISA participants

Rutkowski

Liaw

2019

Assessment in Education: Principles, Policy & Practice

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuan‐Ling Liaw

Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

Sensitivity of the RMSD for Detecting Item‐Level Misfit in Low‐Performing Countries

Collapsing Categorical Variables and Measurement Invariance

Measuring Widening Proficiency Differences in International Assessments: Are Current Approaches Enough?

The existence and impact of floor effects for low-performing PISA participants

Contact Info

Product

Resources

About