Context Recent reviews have claimed that the script concordance test (SCT) methodology generally produces reliable and valid assessments of clinical reasoning and that the SCT may soon be suitable for high‐stakes testing. Objectives This study is intended to describe three major threats to the validity of the SCT not yet considered in prior research and to illustrate the severity of these threats. Methods We conducted a review of SCT reports available through the Web of Science database. Additionally, we reanalysed scores from a previously published SCT administration to explore issues related to standard SCT scoring practice. Results Firstly, the predominant method for aggregate and partial credit scoring of SCTs introduces logical inconsistencies in the scoring key. Secondly, our literature review shows that SCT reliability studies have generally ignored inter‐panel, inter‐panellist and test–retest measurement error. Instead, studies have focused on observed levels of coefficient alpha, which is neither an informative index of internal structure nor a comprehensive index of reliability for SCT scores. As such, claims that SCT scores show acceptable reliability are premature. Finally, SCT criteria for item inclusion, in concert with a statistical artefact of the SCT format, cause anchors at the extremes of the scale to have less expected credit than anchors near or at the midpoint. Consequently, SCT scores are likely to reflect construct‐irrelevant differences in examinees' response styles. This makes the test susceptible to bias against candidates who endorse extreme scale anchors more readily; it also makes two construct‐irrelevant test taking strategies extremely effective. In our reanalysis, we found that examinees could drastically increase their scores by never endorsing extreme scale points. Furthermore, examinees who simply endorsed the scale midpoint for every item would still have outperformed most examinees who used the scale as it is intended. Conclusions Given the severity of these threats, we conclude that aggregate scoring of SCTs cannot be recommended. Recommendations for revisions of SCT methodology are discussed.
Mastery learning is an instructional approach in which educational progress is based on demonstrated performance, not curricular time. Learners practice and retest repeatedly until they reach a designated mastery level; the final level of achievement is the same for all, although time to mastery may vary. Given the unique properties of mastery learning assessments, a thoughtful approach to establishing the performance levels and metrics that determine when a learner has demonstrated mastery is essential.Standard-setting procedures require modification when used for mastery learning settings in health care, particularly regarding the use of evidence-based performance data, the determination of appropriate benchmark or comparison groups, and consideration of patient safety consequences. Information about learner outcomes and past performance data of learners successful at the subsequent level of training can be more helpful than traditional information about test performance of past examinees. The marginally competent "borderline student" or "borderline group" referenced in traditional item-based and examinee-based procedures will generally need to be redefined in mastery settings. Patient safety considerations support conjunctive standards for key knowledge and skill subdomains and for items that have an impact on clinical outcomes. Finally, traditional psychometric indices used to evaluate the quality of standards do not necessarily reflect critical measurement properties of mastery assessments. Mastery learning and testing are essential to the achievement and assessment of entrustable professional activities and residency milestones. With careful attention, sound mastery standard-setting procedures can provide an essential step toward improving the effectiveness of health professions education, patient safety, and patient care.
Because tests that do not alter management (i.e., influence decisions and actions) should not be performed, data on the consequences of assessment constitute a critical source of validity evidence. Consequences validity evidence is challenging for many educators to understand, perhaps because it has no counterpart in the older framework of content, criterion, and construct validity. The authors' purpose is to explain consequences validity evidence and propose a framework for organizing its collection and interpretation.Both clinical and educational assessments can be viewed as interventions. The act of administering or taking a test, the interpretation of scores, and the ensuing decisions and actions influence those being assessed (e.g., patients or students) and other people and systems (e.g., physicians, teachers, hospitals, schools). Consequences validity evidence examines such impacts of assessments. Despite its importance, consequences evidence is reported infrequently in health professions education (range 5%-20% of studies in recent systematic reviews) and is typically limited in scope and rigor.Consequences validity evidence can derive from evaluations of the impact on examinees, educators, schools, or the end target of practice (e.g., patients or health care systems); and the downstream impact of classifications (e.g., different score cut points and labels). Impact can result from the uses of scores or from the assessment activity itself, and can be intended or unintended and beneficial or harmful. Both quantitative and qualitative research methods are useful. The type, quantity, and rigor of consequences evidence required will vary depending on the assessment and the claims for its use.
Learning curves can support a competency-based approach to assessment for learning. When interpreting repeated assessment data displayed as learning curves, a key assessment question is: "How well is each learner learning?" We outline the validity argument and investigation relevant to this question, for a computer-based repeated assessment of competence in electrocardiogram (ECG) interpretation. We developed an on-line ECG learning program based on 292 anonymized ECGs collected from an electronic patient database. After diagnosing each ECG, participants received feedback including the computer interpretation, cardiologist's annotation, and correct diagnosis. In 2015, participants from a single institution, across a range of ECG skill levels, diagnosed at least 60 ECGs. We planned, collected and evaluated validity evidence under each inference of Kane's validity framework. For Scoring, three cardiologists' kappa for agreement on correct diagnosis was 0.92. There was a range of ECG difficulty across and within each diagnostic category. For Generalization, appropriate sampling was reflected in the inclusion of a typical clinical base rate of 39% normal ECGs. Applying generalizability theory presented unique challenges. Under the Extrapolation inference, group learning curves demonstrated expert-novice differences, performance increased with practice and the incremental phase of the learning curve reflected ongoing, effortful learning. A minority of learners had atypical learning curves. We did not collect Implications evidence. Our results support a preliminary validity argument for a learning curve assessment approach for repeated ECG interpretation with deliberate and mixed practice. This approach holds promise for providing educators and researchers, in collaboration with their learners, with deeper insights into how well each learner is learning.
Research into education for IMGs is critically important but currently underdeveloped. An abundance of justification studies and lack of clarification studies parallel other areas of medical education. Academic fields outside medical education, such as those of cross-cultural psychology and expatriate management, are highly relevant; researchers from these areas should be sought for collaboration. Future research should employ conceptual frameworks in order to facilitate a broader, more nuanced consideration of the diversity of individual IMGs, educational and medical contexts, interventions and outcomes. Rigorous comparative effectiveness research is lacking, but represents a promising avenue for future scholarship.
Medical schools and certification agencies should consider implications of assigning weights with respect to composite score reliability and consequences on pass-fail decisions.
There is initial validity evidence for use of this rubric to score local clinical exams that are based on the new USMLE patient note format.
IntroductionAs patient volumes continue to increase, more attention must be paid to skills that foster efficiency without sacrificing patient safety. The emergency department is a fertile ground for examining leadership and management skills, especially those that concern prioritization in multi-patient environments. We sought to understand the needs of emergency physicians (EPs) and emergency medicine junior trainees with regards to teaching and learning about how best to handle busy, multi-patient environments.MethodA cognitive task analysis was undertaken, using a qualitative approach to elicit knowledge of EPs and residents about handling busy emergency department situations. Ten experienced EPs and 10 junior emergency medicine residents were interviewed about their experiences in busy emergency departments. Transcripts of the interviews were analyzed inductively and iteratively by two independent coders using an interpretive description technique.ResultsEP teachers and junior residents differed in their perceptions of what makes an emergency department busy. Moreover, they focused on different aspects of patient care that contributed to their busyness: EP teachers tended to focus on volume of patients, junior residents tended to focus on the complexity of certain cases. The most important barrier to effective teaching and learning of managerial skills was thought to be the lack of faculty development in this skill set.ConclusionsThis study presents qualitative data that helps us elucidate how patient volumes affect our learning environments, and how clinical teachers and residents operate within these environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.