ChatGPT's Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional Study

Golan, Roei; Ripps, Sarah J; Reddy, Raghuram; Loloi, Justin; Bernstein, Ari P; Connelly, Zachary M; Golan, Noa S; Ramasamy, Ranjith

doi:10.7759/cureus.42214

Cited by 20 publications

(20 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Researchers also specifically stated that 11.5% of the 104 questions asked to ChatGPT were answered incorrectly. [ 34 ] Golan et al [ 35 ] assessed the quality of online content on shock wave therapy for erectile dysfunction, and found that ChatGPT’s performance in assessing the quality of text content was inadequate and was not consistent with standards set by human evaluators and reliable tools. [ 35 ] Momenaei et al [ 36 ] evaluated the relevance and readability of the medical information provided by ChatGPT-4 regarding common vitreoretinal surgeries for retinal detachments (RDs), macular holes (MHs), and epiretinal membranes (ERMs).…”

Section: Discussionmentioning

confidence: 99%

“…[ 34 ] Golan et al [ 35 ] assessed the quality of online content on shock wave therapy for erectile dysfunction, and found that ChatGPT’s performance in assessing the quality of text content was inadequate and was not consistent with standards set by human evaluators and reliable tools. [ 35 ] Momenaei et al [ 36 ] evaluated the relevance and readability of the medical information provided by ChatGPT-4 regarding common vitreoretinal surgeries for retinal detachments (RDs), macular holes (MHs), and epiretinal membranes (ERMs). They noted that the mean Flesch Kincaid Grade Level and Flesch Reading Ease Score were 14.1 ± 2.6 and 32.3 ± 10.8 for RD, 14 ± 1.3 and 34.4 ± 7 for MD, 14.8 ± 1.3 and 28.1 ± 7.5 for ERM.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses

Gül,

Erdemir,

Hanci

et al. 2024

Medicine

View full text Add to dashboard Cite

Subdural hematoma is defined as blood collection in the subdural space between the dura mater and arachnoid. Subdural hematoma is a condition that neurosurgeons frequently encounter and has acute, subacute and chronic forms. The incidence in adults is reported to be 1.72–20.60/100.000 people annually. Our study aimed to evaluate the quality, reliability and readability of the answers to questions asked to ChatGPT, Bard, and perplexity about “Subdural Hematoma.” In this observational and cross-sectional study, we asked ChatGPT, Bard, and perplexity to provide the 100 most frequently asked questions about “Subdural Hematoma” separately. Responses from both chatbots were analyzed separately for readability, quality, reliability and adequacy. When the median readability scores of ChatGPT, Bard, and perplexity answers were compared with the sixth-grade reading level, a statistically significant difference was observed in all formulas (P < .001). All 3 chatbot responses were found to be difficult to read. Bard responses were more readable than ChatGPT’s (P < .001) and perplexity’s (P < .001) responses for all scores evaluated. Although there were differences between the results of the evaluated calculators, perplexity’s answers were determined to be more readable than ChatGPT’s answers (P < .05). Bard answers were determined to have the best GQS scores (P < .001). Perplexity responses had the best Journal of American Medical Association and modified DISCERN scores (P < .001). ChatGPT, Bard, and perplexity’s current capabilities are inadequate in terms of quality and readability of “Subdural Hematoma” related text content. The readability standard for patient education materials as determined by the American Medical Association, National Institutes of Health, and the United States Department of Health and Human Services is at or below grade 6. The readability levels of the responses of artificial intelligence applications such as ChatGPT, Bard, and perplexity are significantly higher than the recommended 6th grade level.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses

Gül,

Erdemir,

Hanci

et al. 2024

Medicine

View full text Add to dashboard Cite

show abstract

“…Notably, it demonstrated impressive performance in summarizing conference panels and recommendations [27], generating research questions [28], extracting data from literature abstracts [29], drafting medical papers based on given datasets [30], and generating references from medical articles [31]. ChatGPT was also utilized to evaluate the quality and readability of online medical text regarding shockwave therapy for erectile dysfunction [32]. These applications highlighted the potential of LLMs to condense complex and extensive research materials, allowing for more accessible comprehension and utilization of information in healthcare.…”

Section: Summarizationmentioning

confidence: 99%

“…Accuracy. Several studies highlighted that ChatGPT exhibited inaccuracies when asked to respond to certain questions [14,18,23,29,32,34,35,38,43,50,52,53,64,65,67,71,72]. For instance, ChatGPT could respond with incomplete information or exhibit an inability to distinguish between truth and falsehood [21,69].…”

Section: Reliabilitymentioning

confidence: 99%

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Wang,

Wan,

et al. 2024

Preprint

View full text Add to dashboard Cite

Background: The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including healthcare. Numerous studies have since been conducted regarding how to employ state-of-the-art LLMs in health-related scenarios to assist patients, doctors, and public health administrators. Objective: This review aims to summarize the applications and concerns of applying conversational LLMs in healthcare and provide an agenda for future research on LLMs in healthcare. Methods: We utilized PubMed, ACM, and IEEE digital libraries as primary sources for this review. We followed the guidance of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRIMSA) to screen and select peer-reviewed research articles that (1) were related to both healthcare applications and conversational LLMs and (2) were published before September 1st, 2023, the date when we started paper collection and screening. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT from OpenAI (60), followed by Bard from Google (1), Large Language Model Meta AI (LLaMA) from Meta (1), and other LLMs (5). These papers were classified into four categories in terms of their applications: 1) summarization, 2) medical knowledge inquiry, 3) prediction, and 4) administration, and four categories of concerns: 1) reliability, 2) bias, 3) privacy, and 4) public acceptability. There are 49 (75%) research papers using LLMs for summarization and/or medical knowledge inquiry, and 58 (89%) research papers expressing concerns about reliability and/or bias. We found that conversational LLMs exhibit promising results in summarization and providing medical knowledge to patients with a relatively high accuracy. However, conversational LLMs like ChatGPT are not able to provide reliable answers to complex health-related tasks that require specialized domain expertise. Additionally, no experiments in our reviewed papers have been conducted to thoughtfully examine how conversational LLMs lead to bias or privacy issues in healthcare research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications brought bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in healthcare.

show abstract

“…This was demonstrated when findings related to shockwave therapy for erectile dysfunction diverged from those generated by human experts and trusted tools like DISCERN. 13 AI-based chatbots, including ChatGPT, are programmed and trained using data pertaining to psychiatric conditions. The convenience, ease, and simulation of human-like conversation render these applications as valuable tools for delivering therapy within the realm of psychiatry.…”

Section: Chatgpt In Healthcarementioning

confidence: 99%

Perspectives of ChatGPT in Pharmacology Education, and Research in Health Care: A Narrative Review

Patel,

Pandya,

Sojitra

2023

Journal of Pharmacology and Pharmacotherapeutics

View full text Add to dashboard Cite

Background In the era of advanced Open artificial intelligence (AI) technology, the large language model tool known as chat generative pre-training transformer (ChatGPT) is gaining an increasing number of users in various fields such as healthcare, medical education, agriculture, and customer support due to its features like information retrieval, generating human-like conversations, and natural language processing. Purpose The purpose of this narrative review is to present the perspectives of ChatGPT in pharmacology and medical education. And highlight the limitations of ChatGPT in these areas and draw the attention of policymakers in healthcare to implement such technologies while taking into consideration ethical issues. Methods To collect information regarding the perspectives of ChatGPT in pharmacology and medical education. And highlight the limitations of ChatGPT in these areas. Results In health care, it helps in the drug discovery and development process, diagnosis, treatment, counseling, assisting in surgical procedures, pharmacovigilance, pharmacy, and so on. In medical education, this tool plays a crucial role in online tutoring, personalized assistance, grading, improvement in grammar, and so on. Despite the limitations, ChatGPT is helpful in healthcare, medical education, and scientific writing. To overcome such limitations of ChatGPT, like ethical issues, emotionlessness, providing information before 2021, the risk of biases, uncontrollability, lack of transparency, academic dishonesty, and so on, alternatives have been developed, but they also fail to entirely resolve the associated limitations. Conclusion Looking at the current scenarios, there is an urgent need for comprehensive guidelines to address these limitations and provide a framework for appropriately utilizing AI tools in healthcare domains. This framework should also focus on maintaining a balance between human involvement and technological advancements.

show abstract

ChatGPT's Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional Study

Cited by 20 publications

References 18 publications

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Perspectives of ChatGPT in Pharmacology Education, and Research in Health Care: A Narrative Review

Contact Info

Product

Resources

About