2023
DOI: 10.3389/fonc.2023.1219326
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating large language models on a highly-specialized topic, radiation oncology physics

Jason Holmes,
Zhengliang Liu,
Lian Zhang
et al.

Abstract: PurposeWe present the first study to investigate Large Language Models (LLMs) in answering radiation oncology physics questions. Because popular exams like AP Physics, LSAT, and GRE have large test-taker populations and ample test preparation resources in circulation, they may not allow for accurately assessing the true potential of LLMs. This paper proposes evaluating LLMs on a highly-specialized topic, radiation oncology physics, which may be more pertinent to scientific and medical communities in addition t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 51 publications
(20 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…These comparisons highlighted the potential of ChatGPT in higher educational assessments; nevertheless, it showed the importance of ongoing refinements of these models and the dangers of inaccuracies it poses (Lo, 2023;Sallam, 2023;Sallam et al, 2023d;Gill et al, 2024). However, making direct comparisons across variable studies can be challenging due to differences in models implemented, subject fields of the exams, test dates, and the exact approaches of prompt construction (Holmes et al, 2023;Huynh Linda et al, 2023;Meskó, 2023;Oh et al, 2023;Skalidis et al, 2023;Yaa et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…These comparisons highlighted the potential of ChatGPT in higher educational assessments; nevertheless, it showed the importance of ongoing refinements of these models and the dangers of inaccuracies it poses (Lo, 2023;Sallam, 2023;Sallam et al, 2023d;Gill et al, 2024). However, making direct comparisons across variable studies can be challenging due to differences in models implemented, subject fields of the exams, test dates, and the exact approaches of prompt construction (Holmes et al, 2023;Huynh Linda et al, 2023;Meskó, 2023;Oh et al, 2023;Skalidis et al, 2023;Yaa et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…Because of the inherent nature of their learning, LLMs predict the next token (word or phrase), which may or may not always be factually true. Despite these constraints, recent experiments with ChatGPT taking standardised tests have yielded remarkable results [34][35][36]. This demonstrated that ChatGPT, and LLMs in general have the emergent ability to perform critical reasoning, and answer complex questions.…”
Section: Llm As a Decision Support Toolmentioning
confidence: 99%
“…For instance, ChatGPT has shown remarkable accuracy in reasoning questions and medical exams [43,44], even successfully passing the Chinese Medical Licensing Exam [45] and the United States Medical Licensing Exam (USMLE) [46]. It also performed well in addressing radiation oncology physics exam questions [47]. Likewise, "ChatGPT would have been at the 87 th percentile of Bunting's 2013 international cohort for the Cardiff Fertility Knowledge Scale and at the 95 th percentile on the basis of Kudesia's 2017 cohort for the Fertility and Infertility Treatment Knowledge Score" [48].…”
Section: Medical Knowledge Inquirymentioning
confidence: 99%