Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks.
Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.
We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised
Item disclosure is one of the most serious threats to the validity of high stakes examinations, and identifying examinees that may have had unauthorized access to material is an important step in ensuring the integrity of an examination. A procedure was developed to identify examinees that potentially had unauthorized prior access to examination content. A standardized difference score is created by comparing examinee ability estimates for potentially exposed items to ability estimates for unexposed items. Outliers in this distribution are then flagged for further review. The steps associated with this procedure are described and followed by an example of applying the procedure. In addition, the use of this procedure is supported by the results of a simulation that models the use of unauthorized access to examination material.
Purpose:The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory.Methods:Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.Results:The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).Conclusion:Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.