The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations

Chervenak, Joseph; Lieman, Harry; Blanco-Breindel, Miranda; Jindal, Sangita

doi:10.1016/j.fertnstert.2023.05.151

Cited by 34 publications

(27 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It also performed well in addressing radiation oncology physics exam questions [47]. Likewise, “ChatGPT would have been at the 87 th percentile of Bunting’s 2013 international cohort for the Cardiff Fertility Knowledge Scale and at the 95 th percentile on the basis of Kudesia’s 2017 cohort for the Fertility and Infertility Treatment Knowledge Score” [48]. In addition, ChatGPT showed promising results in a simulated Ophthalmic Knowledge Assessment Program (OKAP) exam [49].…”

Section: Resultsmentioning

confidence: 99%

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Wang,

Wan,

et al. 2024

Preprint

View full text Add to dashboard Cite

Background: The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective: This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods: We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.

show abstract

Section: Resultsmentioning

confidence: 99%

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Wang,

Wan,

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…An overview of the presence of codes for each study is provided in Supplementary Section 3. The majority of articles investigated the use and feasibility of LLMs as medical chatbots (n=84/89, 94.4%) 13,24–62,64–66,68,69,71–96,98–111 , while fewer reports additionally or exclusively focused on the generation of patient information (n=19/89, 21.4%) 24,31,43,48,49,57,59,62,67,70,79,88–91,97,102,106,107 , including clinical documentation such as informed consent forms (n=5/89, 5.6%) 43,67,91,97,102 and discharge instructions (n=1/89, 1.1%) 31 , or translation/summarization tasks of medical texts (n=5/89, 5.6%) 24,49,57,79,89 , creation of patient education materials (n=5/89, 5.6%) 48,62,90,106,107 , and simplification of radiology reports (n=2/89, 2.3%) 59,88 . Most reports evaluated LLMs in English (n=88/89, 98.9%) 13,24–103,105–111 , followed by Arabic (n=2/84, 2.3%) 32,104 , Mandarin (n=2/84, 2.3%) 36,75 , and Korean or Spanish (n=1/89, 1.1%, respectively) 75 .…”

Section: Resultsmentioning

confidence: 99%

“…In terms of design limitations, many authors noted the limitation that LLMs are not optimized for medical use (n=46/89, 51.7%) 13,26,28,34,35,37–39,46,49,50,54–59,61,62,65,66,68,70,71,79–81,83–85,88,91,93–98,100–107,109 , including implicit knowledge/lack of clinical context (n=13/89, 14.6%) 28,39,46,66,71,79,81,83–85,98,103 , limitations in clinical reasoning (n=7/89, 7.9%) 55,84,95,102–105 , limitations in medical image processing/production (n=5/89, 5.6%) 37,55,91,106,107 , and misunderstanding of medical information and terms by the model (n=7/89, 7.9%) 28,38,39,59,62,65,97 . In addition, data-related limitations were identified, including limited access to data on the internet (n=22/89, 24.7%) 38,39,41,43,54–57,59,60,64,76,79,82–84,88,91,94,96,104,109 , the undisclosed origin of training data (n=36/89, 40.5%) 25,26,29,30,32,34,36,37,40,46,…”

Section: Resultsmentioning

confidence: 99%

“…An overview of the presence of codes for each study is provided in Supplementary Section 3. The majority of articles investigated the use and feasibility of LLMs as medical chatbots (n=84/89, 94.4%) 13,[64][65][66]68,69,[98][99][100][101][102][103][104][105][106][107][108][109][110][111] , while fewer reports additionally or exclusively focused on the generation of patient information (n=19/89, 21.4%) 24,31,43,48,49,57,59,62,67,70,79,[88][89][90][91]97,102,106,107 , including clinical documentation such as informed consent forms (n=5/89, 5.6%) 43,67,91,97,102 and discharge instructions (n=1/89, 1.1%) 31 , or translation/summarization tasks of medical texts (n=5/89, 5.6%) 24,49,…”

Section: Applications Of Large Language Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Busch,

Hoffmann,

Rueger

et al. 2024

Preprint

View full text Add to dashboard Cite

The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care using a data-driven convergent synthesis approach. We searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4,349 initial records, 89 studies across 29 medical specialties were included, primarily examining models based on the GPT-3.5 (53.2%, n=66 of 124 different LLMs examined per study) and GPT-4 (26.6%, n=33/124) architectures in medical question answering, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations included 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations included 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias. In conclusion, this study is the first review to systematically map LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.

show abstract

“…LLMs have also shown potential to provide responses to real-world health questions. When presented with to frequently asked clinical queries, ChatGPT is able to produce relevant, meaningful responses comparable to established sources [20,21 ▪▪ ]. One study even showed that a panel of licensed healthcare professionals preferred ChatGPT's responses to patient questions 79% of the time [22].…”

Section: Text Of Reviewmentioning

confidence: 99%

Large language models and the future of rheumatology: assessing impact and emerging opportunities

Mannstadt,

Mehta

2023

Current Opinion in Rheumatology

View full text Add to dashboard Cite

Purpose of review Large language models (LLMs) have grown rapidly in size and capabilities as more training data and compute power has become available. Since the release of ChatGPT in late 2022, there has been growing interest and exploration around potential applications of LLM technology. Numerous examples and pilot studies demonstrating the capabilities of these tools have emerged across several domains. For rheumatology professionals and patients, LLMs have the potential to transform current practices in medicine. Recent findings Recent studies have begun exploring capabilities of LLMs that can assist rheumatologists in clinical practice, research, and medical education, though applications are still emerging. In clinical settings, LLMs have shown promise in assist healthcare professionals enabling more personalized medicine or generating routine documentation like notes and letters. Challenges remain around integrating LLMs into clinical workflows, accuracy of the LLMs and ensuring patient data confidentiality. In research, early experiments demonstrate LLMs can offer analysis of datasets, with quality control as a critical piece. Lastly, LLMs could supplement medical education by providing personalized learning experiences and integration into established curriculums. Summary As these powerful tools continue evolving at a rapid pace, rheumatology professionals should stay informed on how they may impact the field.

show abstract

The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations

Cited by 34 publications

References 30 publications

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Systematic Review of Large Language Models for Patient Care: Current Applications and Challenges

Large language models and the future of rheumatology: assessing impact and emerging opportunities

Contact Info

Product

Resources

About