Society for Simulation in Healthcare Guidelines for Simulation Training

BACKGROUND Case studies have shown ChatGPT can run clinical simulations at the medical student level. However, no data have assessed ChatGPT’s reliability in meeting desired simulation criteria such as medical accuracy, simulation formatting, and robust feedback mechanisms. OBJECTIVE To quantify ChatGPT’s ability to consistently follow formatting instructions and create simulations for preclinical medical student learners according to principles of medical simulation and multimedia educational technology. METHODS Using ChatGPT-4 and a pre-validated starting prompt, the authors ran 360 separate simulations of an acute asthma exacerbation. 180 simulations were given correct answers and 180 were given incorrect answers. ChatGPT was evaluated for its ability to adhere to basic simulation parameters (stepwise progression, free response, interactivity), advanced simulation parameters (autonomous conclusion, delayed feedback, comprehensive feedback), and medical accuracy (vignette, treatment updates, feedback). Significance was determined with chi-squared analyses using 95% confidence intervals for odds ratios. RESULTS 100% of simulations met basic simulation parameters and were medically accurate. For advanced parameters, 55% of all simulations delayed feedback, while the Correct arm (87%) delayed feedback significantly more than the Incorrect arm (24%) (p<0.001). 79% of simulations concluded autonomously, and there was no difference between the Correct and Incorrect arms in autonomous conclusion (81%, 77%; p=0.364). 78% of simulations gave comprehensive feedback, and there was no difference between the Correct and Incorrect arms in comprehensive feedback (76%, 81%; p=0.306). ChatGPT-4 was significantly more likely to conclude simulations autonomously (p<0.001) and provide comprehensive feedback (p<0.001) when feedback was delayed compared to when feedback was not delayed. CONCLUSIONS ChatGPT simulations performed perfectly on medical accuracy and basic simulation parameters. It performed well on comprehensive feedback and autonomous conclusion. Delayed feedback depended on the accuracy of user inputs. A simulation meeting one advanced parameter was more likely to meet all advanced parameters. These simulations have the potential to be a reliable educational tool for simple simulations and can be evaluated by a novel nine-part metric. Further work must be done to ensure consistent performance across a broader range of simulation scenarios.

show abstract

Society for Simulation in Healthcare Guidelines for Simulation Training

Cited by 2 publications

References 152 publications

The association of recent simulation training and clinical experience of team leaders with cardiopulmonary resuscitation quality during in-hospital cardiac arrest

The association of recent simulation training and clinical experience of team leaders with cardiopulmonary resuscitation quality during in-hospital cardiac arrest

Quantified Performance of ChatGPT-4 Patient Management Simulations for Early Clinical Education: Pilot Study (Preprint)

Contact Info

Product

Resources

About