André-Philippe Boulais scite author profile

Automated essay scoring systems yield scores that consistently agree with those of human raters at a level as high, if not higher, as the level of agreement among human raters themselves. The system offers medical educators many benefits for scoring constructed-response tasks, such as improving the consistency of scoring, reducing the time required for scoring and reporting, minimising the costs of scoring, and providing students with immediate feedback on constructed-response tasks.

show abstract

Evaluating the Psychometric Characteristics of Generated Multiple-Choice Test Items

Gierl

Lai

Pugh

et al. 2016

Applied Measurement in Education

View full text Add to dashboard Cite

Using Automatic Item Generation to Improve the Quality of MCQ Distractors

Lai

Gierl

Touchie

et al. 2016

Teaching and Learning in Medicine

View full text Add to dashboard Cite

Previous research on AIG highlighted how this item development method can be used to produce high-quality stems and correct options for MCQ exams. The purpose of the current study was to describe, illustrate, and evaluate a method for modeling plausible but incorrect options. Evidence provided in this study demonstrates that AIG can produce psychometrically sound test items. More important, by adapting the distractors to match the unique features presented in the stem and correct option, the generation of MCQs using automated procedure has the potential to produce plausible distractors and yield large numbers of high-quality items for medical education.

show abstract

Using Automated Scoring to Evaluate Written Responses in English and French on a High-Stakes Clinical Competency Examination

et al. 2015

View full text Add to dashboard Cite

We present a framework for technology-enhanced scoring of bilingual clinical decision-making (CDM) questions using an open-source scoring technology and evaluate the strength of the proposed framework using operational data from the Medical Council of Canada Qualifying Examination. Candidates' responses from six write-in CDM questions were used to develop a three-stage-automated scoring framework. In Stage 1, the linguistic features from CDM responses were extracted. In Stage 2, supervised

show abstract

Identifying the Unauthorized Use of Examination Material

et al. 2009

View full text Add to dashboard Cite

Item disclosure is one of the most serious threats to the validity of high stakes examinations, and identifying examinees that may have had unauthorized access to material is an important step in ensuring the integrity of an examination. A procedure was developed to identify examinees that potentially had unauthorized prior access to examination content. A standardized difference score is created by comparing examinee ability estimates for potentially exposed items to ability estimates for unexposed items. Outliers in this distribution are then flagged for further review. The steps associated with this procedure are described and followed by an example of applying the procedure. In addition, the use of this procedure is supported by the results of a simulation that models the use of unauthorized access to examination material.

show abstract

Calibrating the Medical Council of Canada’s Qualifying Examination Part I using an integrated item response theory framework: a comparison of models and designs

Champlain

Boulais

Dallas

2016

J Educ Eval Health Prof

View full text Add to dashboard Cite

Purpose:The aim of this research was to compare different methods of calibrating multiple choice question (MCQ) and clinical decision making (CDM) components for the Medical Council of Canada’s Qualifying Examination Part I (MCCQEI) based on item response theory.Methods:Our data consisted of test results from 8,213 first time applicants to MCCQEI in spring and fall 2010 and 2011 test administrations. The data set contained several thousand multiple choice items and several hundred CDM cases. Four dichotomous calibrations were run using BILOG-MG 3.0. All 3 mixed item format (dichotomous MCQ responses and polytomous CDM case scores) calibrations were conducted using PARSCALE 4.Results:The 2-PL model had identical numbers of items with chi-square values at or below a Type I error rate of 0.01 (83/3,499 or 0.02). In all 3 polytomous models, whether the MCQs were either anchored or concurrently run with the CDM cases, results suggest very poor fit. All IRT abilities estimated from dichotomous calibration designs correlated very highly with each other. IRT-based pass-fail rates were extremely similar, not only across calibration designs and methods, but also with regard to the actual reported decision to candidates. The largest difference noted in pass rates was 4.78%, which occurred between the mixed format concurrent 2-PL graded response model (pass rate= 80.43%) and the dichotomous anchored 1-PL calibrations (pass rate= 85.21%).Conclusion:Simpler calibration designs with dichotomized items should be implemented. The dichotomous calibrations provided better fit of the item response matrix than more complex, polytomous calibrations.

show abstract

Les avancées technologiques, les enjeux et les défis de la notation automatisée en éducation dans le domaine de la santé

Morin¹,

Boulais²,

Champlain³

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.