2023
DOI: 10.1101/2023.02.13.23285745
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ChatGPT- versus human-generated answers to frequently asked questions about diabetes: a Turing test-inspired survey among employees of a Danish diabetes center

Abstract: Background Large language models have received enormous attention recently with some studies demonstrating their potential clinical value, despite not being trained specifically for this domain. We aimed to investigate whether ChatGPT, a language model optimized for dialogue, can answer frequently asked questions about diabetes. Methods We conducted a closed e-survey among employees of a large Danish diabetes center. The study design was inspired by the Turing test and non-inferiority trials. Our survey includ… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 27 publications
(36 reference statements)
0
5
0
Order By: Relevance
“…The outcome of the game rests on two interrelated factors: first, the capacity of the machine to produce communicative artefacts that imitate the attributes of those produced by humans. As has been alluded to already, and will be reviewed in more detail below, there is emerging evidence that ChatGPT currently possesses ample capacity to interpret text prompts and produce sophisticated human-like texts (Gao et al, 2023;Hulman et al, 2023;Nov et al, 2023); second, and the focus of the current study, is the degree to which the human interrogator is sensitive to the attributes of communicative artefacts that indicate whether they are produced by a human or a machine.…”
Section: Introductionmentioning
confidence: 93%
See 2 more Smart Citations
“…The outcome of the game rests on two interrelated factors: first, the capacity of the machine to produce communicative artefacts that imitate the attributes of those produced by humans. As has been alluded to already, and will be reviewed in more detail below, there is emerging evidence that ChatGPT currently possesses ample capacity to interpret text prompts and produce sophisticated human-like texts (Gao et al, 2023;Hulman et al, 2023;Nov et al, 2023); second, and the focus of the current study, is the degree to which the human interrogator is sensitive to the attributes of communicative artefacts that indicate whether they are produced by a human or a machine.…”
Section: Introductionmentioning
confidence: 93%
“…Another recent study inspired by Turing's Imitation Game was undertaken by Hulman et al (2023) among 183 employees of a large health provider service in Denmark. The objective of the study was to determine how adequately ChatGPT could answer 10 frequently asked questions that were of relevance to the healthcare service (i.e., questions about diabetes).…”
Section: The Imitation Game Paradigm To Investigate Chatgptmentioning
confidence: 99%
See 1 more Smart Citation
“…FActScores [29] evaluates text generated by LLMs via an evaluation method that breaks a generation into atomic facts which are in turn evaluated by human evaluators. Majority voting for evaluating healthcare related answers generated by LLMs has been employed for myopia care [21], maternity [21], diabetes [26], cancer [27], infant care [21] etc.…”
Section: B Measuring Hallucinationsmentioning
confidence: 99%
“…A study conducted by Hulman et al. (2023) to evaluate the answers given by ChatGPT for diabetes‐related questions against answers given by humans, reports that ChatGPT's answers to two out of 10 diabetes‐related questions contain misinformation. With respect to this, the quality of the answers provided by more traditional CHeQA approaches as described in this survey, whose answers come from validated scientific content, may be higher than those provided by systems based on generic LLMs or PLMs fine‐tuned on smaller sets of domain‐specific data.…”
Section: Future Directionsmentioning
confidence: 99%