2023
DOI: 10.2196/48978
|View full text |Cite
|
Sign up to set email alerts
|

Performance of ChatGPT on the Situational Judgement Test—A Professional Dilemmas–Based Examination for Doctors in the United Kingdom

Abstract: Background ChatGPT is a large language model that has performed well on professional examinations in the fields of medicine, law, and business. However, it is unclear how ChatGPT would perform on an examination assessing professionalism and situational judgement for doctors. Objective We evaluated the performance of ChatGPT on the Situational Judgement Test (SJT): a national examination taken by all final-year medical students in the United Kingdom. Thi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 14 publications
0
6
0
Order By: Relevance
“…A plausible explanation for this discrepancy can be related to different question styles and different exam settings. Taken together, this highlights the need to assess the performance of AI-based models in various disciplines, using different questions' format, and compared to human performance (Borchert et al, 2023;Chen et al, 2023;Deiana et al, 2023;Flores-Cohaila et al, 2023;Puladi et al, 2023). Finally, it is important to acknowledge the limitations inherent in this study.…”
Section: Discussionmentioning
confidence: 91%
“…A plausible explanation for this discrepancy can be related to different question styles and different exam settings. Taken together, this highlights the need to assess the performance of AI-based models in various disciplines, using different questions' format, and compared to human performance (Borchert et al, 2023;Chen et al, 2023;Deiana et al, 2023;Flores-Cohaila et al, 2023;Puladi et al, 2023). Finally, it is important to acknowledge the limitations inherent in this study.…”
Section: Discussionmentioning
confidence: 91%
“…In many cases, the LLM responses scored at or above the 90 th percentile, compared to human test takers. Independent research teams have subsequently examined the performance of LLMs in relation to, for example, medical situational judgment tests (Borchert et al, 2023), medical knowledge and 'soft skills' assessments (Brin et al, 2023), and both single-stimulus and forced-choice personality assessments (Phillips & Robie, 2024). These studies' findings align with OpenAI's report: advanced LLMs out-score most human test takers on several types of tests.…”
Section: The Performance Of Large Language Models On Quantitative And...mentioning
confidence: 72%
“…Published research suggests that LLMs often achieve high scores on tests that comprise extensive verbal information. As noted above, advanced LLMs appear to achieve higher scores than the majority of humans on many knowledge-based assessments (OpenAI, 2023), certain situational judgement tests (Arctic Shores, 2023b;Borchert et al, 2023), and personality questionnaires (if prompted to; Arctic Shores, 2023a;Phillips & Robie, 2024). Further, Elyoseph et al (2023) found that ChatGPT (GPT-3.5) outperformed most humans on an emotional awareness assessment comprising items with text descriptions of situations that required participants to identify an emotional state.…”
Section: The Test-taking Capabilities Of Large Language Modelsmentioning
confidence: 85%
“…The second category involves the assessment of ChatGPT’s knowledge accuracy through testing, including examinations such as United States Medical Licensing Examination, the Situational Judgement Test, and subject tests in medical school [ 8 - 12 ]. In this study, the majority of the students who responded regarding the feedback provided by ChatGPT stated that it demonstrated a high degree of accuracy.…”
Section: Discussionmentioning
confidence: 99%