2020
DOI: 10.1080/08957347.2020.1732384
|View full text |Cite
|
Sign up to set email alerts
|

The Trade-Off between Model Fit, Invariance, and Validity: The Case of PISA Science Assessments

Abstract: In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item functioning (DIF) of items post hoc. This typically requires a choice among retaining an item as it is despite its DIF, deleting the item, or resolving (splitting) an item by c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 33 publications
(44 reference statements)
0
12
0
Order By: Relevance
“…In contrast, using p = 2 corresponds to a quadratic loss function, and all items contribute to the computation of group means. Several researchers have argued that the decision to eliminate items from group comparisons should be (mainly) driven by substantive considerations (see [94,[105][106][107][108][109][110]).…”
Section: Discussionmentioning
confidence: 99%
“…In contrast, using p = 2 corresponds to a quadratic loss function, and all items contribute to the computation of group means. Several researchers have argued that the decision to eliminate items from group comparisons should be (mainly) driven by substantive considerations (see [94,[105][106][107][108][109][110]).…”
Section: Discussionmentioning
confidence: 99%
“…Camilli (1993) pointed out (see also Penfield & Camilli, 2007) that expert reviews of items showing DIF should accompany DIF detection procedures. Only those items should be excluded from country comparisons for which it is justifiable to argue that construct-irrelevant factors caused DIF (see also El Masri & Andrich, 2020;Zwitser et al, 2017). However, the purely statistical approach since PISA 2015 based on partial invariance disregards that DIF items could be construct-relevant.…”
Section: Country Dif and Cross-sectional Country Comparisonsmentioning
confidence: 99%
“…However, one critical aspect of the partial invariance approach (as well as other approaches that result in the removal or downweighting of the contribution of particular items, such as robust Haberman or robust Haebara linking) is that comparisons of different groups rely on different sets of items. We regard this feature as a potential threat to validity and find this practice problematic because it compares apples with oranges (see also El-Masri & Andrich, 2020). For example, the comparison of the country means for Germany with those for Italy in PISA does not involve a full set of common item parameters for each country if the sets of country-specific noninvariant items-that receive country-specific item parameters-differ between the two countries.…”
Section: Mean Comparisons Of Many Groups In the Presence Of Difmentioning
confidence: 99%