The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Masri, Yasmine H. El; Andrich, David

doi:10.1080/08957347.2020.1732384

Cited by 18 publications

(17 citation statements)

References 33 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, using p = 2 corresponds to a quadratic loss function, and all items contribute to the computation of group means. Several researchers have argued that the decision to eliminate items from group comparisons should be (mainly) driven by substantive considerations (see [94,[105][106][107][108][109][110]).…”

Section: Discussionmentioning

confidence: 99%

$L_p$ Loss Functions in Invariance Alignment and Haberman Linking

Robitzsch¹

2020

Preprint

View full text Add to dashboard Cite

The comparison of group means in latent variable models plays a vital role in empirical research in the social sciences. The present article discusses extensions of invariance alignment and Haberman linking concerning the choice of linking functions for comparisons of many groups. Robust linking functions are proposed for invariance alignment and robust Haberman linking that are particularly suited to item response data under partial invariance. In a simulation study, it is shown that both linking approaches have comparable performance, and in some conditions, the newly proposed robust Haberman linking outperforms invariance alignment.

show abstract

Section: Discussionmentioning

confidence: 99%

$L_p$ Loss Functions in Invariance Alignment and Haberman Linking

Robitzsch¹

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Camilli (1993) pointed out (see also Penfield & Camilli, 2007) that expert reviews of items showing DIF should accompany DIF detection procedures. Only those items should be excluded from country comparisons for which it is justifiable to argue that construct-irrelevant factors caused DIF (see also El Masri & Andrich, 2020;Zwitser et al, 2017). However, the purely statistical approach since PISA 2015 based on partial invariance disregards that DIF items could be construct-relevant.…”

Section: Country Dif and Cross-sectional Country Comparisonsmentioning

confidence: 99%

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Robitzsch

Lüdtke

2022

Meas Instrum Soc Sci

View full text Add to dashboard Cite

International large-scale assessments (LSAs), such as the Programme for International Student Assessment (PISA), provide essential information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of the distributions of these cognitive domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect on the conceptual foundations of analytical choices in LSA studies. This article discusses the methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons and (5) trend estimation. This article’s primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

show abstract

“…However, one critical aspect of the partial invariance approach (as well as other approaches that result in the removal or downweighting of the contribution of particular items, such as robust Haberman or robust Haebara linking) is that comparisons of different groups rely on different sets of items. We regard this feature as a potential threat to validity and find this practice problematic because it compares apples with oranges (see also El-Masri & Andrich, 2020). For example, the comparison of the country means for Germany with those for Italy in PISA does not involve a full set of common item parameters for each country if the sets of country-specific noninvariant items-that receive country-specific item parameters-differ between the two countries.…”

Section: Mean Comparisons Of Many Groups In the Presence Of Difmentioning

confidence: 99%

Mean Comparisons of Many Groups in the Presence of DIF: An Evaluation of Linking and Concurrent Scaling Approaches

Robitzsch¹,

Lüdtke

2021

Journal of Educational and Behavioral Statistics

View full text Add to dashboard Cite

One of the primary goals of international large-scale assessments in education is the comparison of country means in student achievement. This article introduces a framework for discussing differential item functioning (DIF) for such mean comparisons. We compare three different linking methods: concurrent scaling based on full invariance, concurrent scaling based on partial invariance using the RMSD statistic, and robust and nonrobust linking approaches based on separate scaling. Furthermore, we analytically derive the bias in the country means of different linking methods in the presence of DIF. In a simulation study, we show that the partial invariance and robust linking approaches provide less biased country means than the full invariance approach in the case of biased items.

show abstract

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Cited by 18 publications

References 33 publications

$L_p$ Loss Functions in Invariance Alignment and Haberman Linking

$L_p$ Loss Functions in Invariance Alignment and Haberman Linking

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Mean Comparisons of Many Groups in the Presence of DIF: An Evaluation of Linking and Concurrent Scaling Approaches

Contact Info

Product

Resources

About