Effect of Differential Item Functioning on Test Equating

doi:10.12738/estp.2015.5.2505

Cited by 3 publications

(2 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The purpose of this simulation study was to investigate the performance of linking methods in the two-group case for the 2PL model under different sample sizes, different numbers of items, and different amounts of uniform and nonuniform DIF. Most simulation studies either assume invariant item parameters (i.e., no DIF) or presuppose partial invariance in which only a few item parameters differ between groups (e.g., [63,[105][106][107][108][109][110][111][112]). There is a lack of research in the presence of random DIF, although there is some initial work for continuous items [88,89].…”

Section: Purposementioning

confidence: 99%

A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Robitzsch

2021

Foundations

View full text Add to dashboard Cite

This article investigates the comparison of two groups based on the two-parameter logistic item response model. It is assumed that there is random differential item functioning in item difficulties and item discriminations. The group difference is estimated using separate calibration with subsequent linking, as well as concurrent calibration. The following linking methods are compared: mean-mean linking, log-mean-mean linking, invariance alignment, Haberman linking, asymmetric and symmetric Haebara linking, different recalibration linking methods, anchored item parameters, and concurrent calibration. It is analytically shown that log-mean-mean linking and mean-mean linking provide consistent estimates if random DIF effects have zero means. The performance of the linking methods was evaluated through a simulation study. It turned out that (log-)mean-mean and Haberman linking performed best, followed by symmetric Haebara linking and a newly proposed recalibration linking method. Interestingly, linking methods frequently found in applications (i.e., asymmetric Haebara linking, recalibration linking used in a variant in current large-scale assessment studies, anchored item parameters, concurrent calibration) perform worse in the presence of random differential item functioning. In line with the previous literature, differences between linking methods turned out be negligible in the absence of random differential item functioning. The different linking methods were also applied in an empirical example that performed a linking of PISA 2006 to PISA 2009 for Austrian students. This application showed that estimated trends in the means and standard deviations depended on the chosen linking method and the employed item response model.

show abstract

Section: Purposementioning

confidence: 99%

A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Robitzsch

2021

Foundations

View full text Add to dashboard Cite

show abstract

“…Tests of sex invariance with the bifactor model indicated that item loadings for the common factor showed significant violations of both metric and scalar invariance (see Table 1). Violations of invariance assumptions can in principle bias scores and score equating (Kabasakal & Kelecioğlu, 2015), but statistically significant bias may not in fact be strong enough to make a practical difference (Wanders et al, 2015). To assess the seriousness of these violations, we used Mplus to calculate empirical Bayes estimates of the common factor scores for the basic single factor model and the configural model, regressing the former on the latter separately for females and males.…”

Section: Testing Construct Equivalence In Measures Of Youth Depressionmentioning

confidence: 99%

Evaluating construct equivalence of youth depression measures across multiple measures and multiple studies.

Howe¹,

Dagne²,

Brown³

et al. 2019

Psychological Assessment

View full text Add to dashboard Cite

Construct equivalence of measures across studies is necessary for synthesizing results when combining data in meta-analysis or integrative data analysis. We discuss several assumptions required for construct equivalence, and review methods using individual-level data and item response theory (IRT) analysis for detecting or adjusting for violations of these assumptions. We apply IRT to data from 7 measures of depressive symptoms for 4,283 youth from 16 randomized prevention trials. Findings indicate that these data violate assumptions of conditional independence. Bifactor IRT models find that depression measures contain substantial reporter variance, and indicate that a single common factor model would be substantially biased. Separate analyses of ratings by youth find stronger evidence for construct equivalence, but factor invariance across sex and age does not hold. We conclude that data synthesis studies employing measures of youth depression should analyze results separately by reporter, explore more complex approaches to integrate these different perspectives, and explore methods that adjust for sex and age differences in item functioning. Public Significance StatementScientists often combine data from several studies to test whether findings continue to hold. This study suggests that combining data on youth depression can be problematic when measures are completed by different reporters, such as youth or parents. It concludes that measures should be combined separately by reporter, and that statistical modeling can be used to increase the validity of combined measures.

show abstract

A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests

SEZER BAŞARAN,

MUTLUER,

ÇAKAN

2023

Participatory Educational Research

View full text Add to dashboard Cite

This research aimed to compare the equated scores by the methods based on classical test theory (CTT) and kernel equating, using covariates design (NEC) and anchor test design (NEAT). TIMSS 2019 science test scores equated by both Tucker, Levine true score, Levine observed score, equipercentile equating (pre-smoothing and post-smoothing) methods in CTT, and linear and equipercentile methods in kernel equating. Additionally, the covariates in NEC design were “home resources for learning,” “student confidence in science and mathematics,” “like learning science,” “instructional clarity in science lessons,” “math achievement,” “sex,” and “speaking the language of the test at home”. The equating results in NEC were compared with those in NEAT and EG. The participants comprised 1699 4th-grade students who attended the e-TIMSS 2019 in Canada, Singapore, and Chile. Results were analyzed according to equating errors and differences between equated scores. The research concluded that math achievement and home resources for learning could be used as covariates in NEC to equate the science test in case equating could not be done in the NEAT. However, when the other variables were used as covariates in NEC, the equated scores were very similar to the EG. Also, Tucker (CTT) and post-stratification (kernel) yielded similar equated scores in linear equating, and these methods were similarly different from kernel linear equating in EG. In equipercentile equating, the equated scores obtained from the post-smoothing (CTT) and EG were close to each other but slightly differed from post-stratification.

show abstract

Effect of Differential Item Functioning on Test Equating

Cited by 3 publications

References 20 publications

A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

A Comparison of Linking Methods for Two Groups for the Two-Parameter Logistic Item Response Model in the Presence and Absence of Random Differential Item Functioning

Evaluating construct equivalence of youth depression measures across multiple measures and multiple studies.

A Comparison of Covariates, Equating Designs, and Methods in Equating TIMSS 2019 Science Tests

Contact Info

Product

Resources

About