Comparison of ChatGPT vs. Bard to Anesthesia-related Queries

Patnaik, Sourav S.; Hoffmann, Ulrich

doi:10.1101/2023.06.29.23292057

Cited by 4 publications

(3 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Both chatbots provided illogical answers and did not always address the knowledge content in the questions. Similarly, Patnaik & Hoffmann, U [ 53 ]. in Texas compare the performance of ChatGPT vs. Bard to answer anesthesia-related queries prior to surgery from a patient's point of view and conclude that though both gave correct responses, they should be considered as useful clinical resource to assist communication between clinicians and patients and not a replacement for the pre-anesthesia consultation.…”

Section: Discussionmentioning

confidence: 99%

“…Rao et al [ 51 ] proposed a qualitative framework for the Myers–Briggs Type Indicator [ 52 ] by comparing the performance of LLMs in evaluating human personalities. Patnaik & Hoffmann [ 53 ], who investigated the hallucinations of LLMs in clinical medicine - on patients' view of anesthesia before surgery, found that ChatGPT exhibited intellectually superior performance over Bard in psychiatry. Patil et al [ 54 ] compare the radiology knowledge of ChatGPT and Bard and conclude that both display reasonable radiology knowledge and should be used with conscious knowledge of their limitations.…”

Section: Literature Reviewmentioning

confidence: 99%

“…In a study by Patnaik and Hoffmann [ 53 ], the readability of the responses of ChatGPT and Bard was compared using the Flesch Readability Ease score and the Flesch-Kincaid Grade Level score. The study found that the responses of Bard had a higher Flesch Readability Ease score and a lower Flesch-Kincaid Grade Level score than the responses of ChatGPT.…”

Section: Research Frameworkmentioning

confidence: 99%

See 2 more Smart Citations

Evaluating human resources management literacy: A performance analysis of ChatGPT and bard

Raman,

Venugopalan,

Kamal

2024

Heliyon

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

Section: Research Frameworkmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating human resources management literacy: A performance analysis of ChatGPT and bard

Raman,

Venugopalan,

Kamal

2024

Heliyon

View full text Add to dashboard Cite

ChatGPT or Bard: Who is a better Certified Ethical Hacker?

Raman,

Calyam,

Achuthan

2024

Computers & Security

View full text Add to dashboard Cite

Classification of Surgical Patients Needing Preoperative Cardiac Evaluations: A Comparison of General-Purpose and Domain-Specific Large Language Models (Preprint)

Tully,

Litake,

Meineke

et al. 2023

Preprint

View full text Add to dashboard Cite

BACKGROUND Tools that can help to identify preoperative patients in need of further cardiovascular testing or consultation may be of use in reducing costs and ensuring rational utilization of resources. OBJECTIVE We evaluate the feasibility of utilizing general purpose versus domain-specific large language models (LLM) for a classification task aimed at identifying these surgical patients. METHODS The objective of this study was to leverage various LLMs to classify patients that would need preoperative cardiac evaluation based on their preoperative clinical notes. General-purpose (BERT, RoBERTa, Longformer) and domain-specific (BioClinicalBERT, PubMedBERT) were used to train on this classification task. Performance was validated on the test set and the area under the receiver operating characteristics curve (AUC), F1-score, sensitivity, specificity, precision, and recall were measured. RESULTS There were 175 patients, in which 67 (38.2%) patients were determined to require preoperative cardiac evaluation/testing. The dataset was divided into a training and test set, which consisted of 75% (n=131) and 25% (n=44) of the dataset. All models performed similarly, in which the AUC was highest with Longformer (0.90) and the Precision-Recall score was highest with PubMedBERT (0.88). CONCLUSIONS This study described the use of three general purpose and two domain-specific LLMs to classify surgical patients in need of preoperative cardiovascular workup. All LLMs had excellent yet similar performance. LLMs may be leveraged on preoperative clinical notes to classify which patients would benefit from preoperative cardiology evaluations. No clinically significant differences were seen between domain-specific and general-purpose LLMs.

show abstract

Comparison of ChatGPT vs. Bard to Anesthesia-related Queries

Cited by 4 publications

References 50 publications

Evaluating human resources management literacy: A performance analysis of ChatGPT and bard

Evaluating human resources management literacy: A performance analysis of ChatGPT and bard

ChatGPT or Bard: Who is a better Certified Ethical Hacker?

Classification of Surgical Patients Needing Preoperative Cardiac Evaluations: A Comparison of General-Purpose and Domain-Specific Large Language Models (Preprint)

Contact Info

Product

Resources

About