2022
DOI: 10.1177/00131644221105819
|View full text |Cite
|
Sign up to set email alerts
|

A Robust Method for Detecting Item Misfit in Large-Scale Assessments

Abstract: Viable methods for the identification of item misfit or Differential Item Functioning (DIF) are central to scale construction and sound measurement. Many approaches rely on the derivation of a limiting distribution under the assumption that a certain model fits the data perfectly. Typical DIF assumptions such as the monotonicity and population independence of item functions are present even in classical test theory but are more explicitly stated when using item response theory or other latent variable models f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…Some researchers argue that one should not fit more complex IRT models than the 2PL model, such as the three-parameter logistic (3PL) IRT model. They argue that at most two item parameters can be identified from multivariate data [ 75 ] and base their argument on a result of the Dutch identity of Holland [ 155 ]. However, Zhang and Stout [ 156 ] disproved the finding.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Some researchers argue that one should not fit more complex IRT models than the 2PL model, such as the three-parameter logistic (3PL) IRT model. They argue that at most two item parameters can be identified from multivariate data [ 75 ] and base their argument on a result of the Dutch identity of Holland [ 155 ]. However, Zhang and Stout [ 156 ] disproved the finding.…”
Section: Discussionmentioning
confidence: 99%
“…However, to disentangle the factor of the definition of DIF items from other model specification factors in the multiverse analysis, we decided to let the DIF item sets be the same across specifications. Note that the PI approach is practically equivalent to a robust linking approach in which the impact of some items is downweighted (or entirely removed) for a particular country [ 75 , 78 , 80 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This type of country-to-country item-level variation has been documented even with highly engineered educational surveys such as the Programme for International Student Assessment (e.g., Zwitser, Glaser, & Maris, 2017), and initial research points to similar concerns with assessments of ECD (e.g., Halpin et al, 2019;Waldman et al, 2021). Psychometric methodology for international comparisons remains an active area of research (e.g., von Davier & Bezirhan, 2022;Halpin, 2022) and can directly inform ongoing efforts in ECD assessment.…”
Section: Validation Studiesmentioning
confidence: 95%
“…It is important to comment on other approaches to quantifying DIF and explain our reasoning for our use of NCDIF* in this context. Seemingly the most popular recent approach applied in large scale international assessment for evaluating DIF at the country-level is the root-mean squared deviation (RSMD; von Davier, 2017) index, which for a given country g, evaluates the empirical conditional probabilities observed for students in the country g about the expected (model-based) conditional probability, typically defined from all countries. The RMSD index for a given item i in country g can be written as but where discrete approximation of P ig,o (θ) is based on empirically observed (rather than model-based) conditional probabilities for item i in group g. P i,e (θ ) is obtained by using the model-based estimated item parameters.…”
Section: Quantifying Two-group and Multiple Group Difmentioning
confidence: 99%