Stephen B. Dunbar scite author profile

In recent years there has been an increasing emphasis on assessment results, as well as increasing concern about the nature of the most widely used forms of student assessment and uses that are made of the results. These conflicting forces have helped create a burgeoning interest in alternative forms of assessments, particularly complex, performance-based assessments. It is argued that there is a need to rethink the criteria by which the quality of educational assessments are judged, and a set of criteria that are sensitive to some of the expectations for performance-based assessments is proposed.

show abstract

An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning

Ankenmann

Witt

Dunbar

1999

J Educational Measurement

View full text Add to dashboard Cite

The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness‐of‐fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple‐replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stephen B. Dunbar

Complex, Performance-Based Assessment: Expectations and Validation Criteria

Pathways to conscience: early mother–child mutually responsive orientation and children's moral emotion, conduct, and cognition

Quality Control in the Development and Use of Performance Assessments

Complex, Performance-Based Assessment: Expectations and Validation Criteria

An Investigation of the Power of the Likelihood Ratio Goodness‐of‐Fit Statistic in Detecting Differential Item Functioning

Contact Info

Product

Resources

About