2024
DOI: 10.1101/2024.04.24.24306315
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark

Fenglin Liu,
Zheng Li,
Hongjian Zhou
et al.

Abstract: The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering task with answer options for evaluation. However, in real clinical settings, many clinical decisions, such as treatment recommendations, involve answering open-ended questions without pre-set options. Meanwhile, existing studies mainly use accuracy to assess model performance. In this paper, we comprehensively benchmark diverse LLMs in healthcare, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
references
References 54 publications
0
0
0
Order By: Relevance