2023
DOI: 10.1001/jamanetworkopen.2023.46721
|View full text |Cite
|
Sign up to set email alerts
|

Performance of Large Language Models on a Neurology Board–Style Examination

Marc Cicero Schubert,
Wolfgang Wick,
Varun Venkataramani

Abstract: ImportanceRecent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.ObjectiveTo assess the performance of LLMs on neurology board–style examinations.Design, Setting, and ParticipantsThis cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation uti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(6 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…There are multiple ways that can provide users with greater confidence about the accuracy of AI search results. The use of a "custom attribution engine" as announced by Adobe would allow users to verify AI findings through source citation [14]. This type of approach should allow users to interpret narrative results in terms of source information and determine if there is any distortion in the AI narrative.…”
Section: Discussionmentioning
confidence: 99%
“…There are multiple ways that can provide users with greater confidence about the accuracy of AI search results. The use of a "custom attribution engine" as announced by Adobe would allow users to verify AI findings through source citation [14]. This type of approach should allow users to interpret narrative results in terms of source information and determine if there is any distortion in the AI narrative.…”
Section: Discussionmentioning
confidence: 99%
“…The study found that ChatGPT models performed notably poorer on questions involving concepts of redefinition or invention [27]. Emerging evidence suggests that AI may exhibit its own cognitive process, as indirectly indicated by a trend of improved performance on questions at the lower levels of Bloom's taxonomy, particularly in disciplines such as neurology, radiology, physiology, microbiology, and biochemistry [28][29][30][31][32]. In these investigations, the majority of questions assessed originated from internal materials or were inaccessible to ChatGPT models during the study periods.…”
Section: Comparison With Prior Workmentioning
confidence: 94%
“…Previous publications evaluating LLMs across various disciplines have covered fields such as, gastroenterology 7 , pathology 8 , neurology 9 , physiology 6 , 10 , and solving case vignettes in physiology 11 . In a cross-sectional study, the performance of LLMs on neurology board–style examinations were assessed using a question bank approved by the American Board of Psychiatry and Neurology.…”
Section: Introductionmentioning
confidence: 99%
“…In a cross-sectional study, the performance of LLMs on neurology board–style examinations were assessed using a question bank approved by the American Board of Psychiatry and Neurology. The questions were categorized into lower-order and higher-order based on the Bloom taxonomy for learning and assessment 9 . To the best of our knowledge there was no study specifically on evaluating LLMs in the field of neurophysiology.…”
Section: Introductionmentioning
confidence: 99%