CONTEXT A test score is a number which purportedly reflects a candidate's proficiency in some clearly defined knowledge or skill domain. A test theory model is necessary to help us better understand the relationship that exists between the observed (or actual) score on an examination and the underlying proficiency in the domain, which is generally unobserved. Common test theory models include classical test theory (CTT) and item response theory (IRT). The widespread use of IRT models over the past several decades attests to their importance in the development and analysis of assessments in medical education. Item response theory models are used for a host of purposes, including item analysis, test form assembly and equating. Although helpful in many circumstances, IRT models make fairly strong assumptions and are mathematically much more complex than CTT models. Consequently, there are instances in which it might be more appropriate to use CTT, especially when common assumptions of IRT cannot be readily met, or in more local settings, such as those that may characterise many medical school examinations.OBJECTIVES The objective of this paper is to provide an overview of both CTT and IRT to the practitioner involved in the development and scoring of medical education assessments.METHODS The tenets of CCT and IRT are initially described. Then, main uses of both models in test development and psychometric activities are illustrated via several practical examples. Finally, general recommendations pertaining to the use of each model in practice are outlined.DISCUSSION Classical test theory and IRT are widely used to address measurement-related issues that arise from commonly used assessments in medical education, including multiple-choice examinations, objective structured clinical examinations, ward ratings and workplace evaluations. The present paper provides an introduction to these models and how they can be applied to answer common assessment questions.item response theory
An approximate χ2 statistic based on McDonald's (1967) nonlinear factor analytic representation of item response theory was proposed and investigated with simulated data. The results were compared with Stout's T statistic (Nandakumar & Stout, 1993; Stout, 1987). Unidimensional and two‐dimensional item response data were simulated under varying levels of sample size, test length, test reliability, and dimension dominance. The approximate χ2 statistic had good control over Type I errors when unidimensional data were generated and displayed very good power in identifying the two‐dimensional data. The performance of the approximate χ2 was at least as good as Stout's T statistic in all conditions and was better than Stout's T statistic with smaller sample sizes and shorter tests. Further implications regarding the potential use of nonlinear factor analysis and the approximate χ2 in addressing current measurement issues are discussed.
Recently, standardized patient assessments and objective structured clinical examinations have been used for high-stakes certification and licensure decisions. In these testing situations, it is important that the assessments are standardized, the scores are accurate and reliable, and the resulting decisions regarding competence ar equitable and defensible. For the decisions to be valid, justifiable standards, or cut-scores, must beset. Unfortunately, unlike the body of research specifically dedicated to multiple-choice examinations, relatively little research has been conducted on standard-setting methods appropriate for use with performance-based assessments. The purpose of this article is to provide the reader with some guidance on how to set defensible standards on performance assessments, especially those that utilize standardized patients in simulated medical encounters. Various methods are discussed and contrasted, highlighting the relevant strengths and weaknesses. In addition, based on the prevailing literature and research, ideas for future studies and potential augmentations to current performance-based standard setting protocols are advanced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.