Medical image captioning via generative pretrained transformers

Selivanov, Alexander; Rogov, Oleg Y.; Chesakov, Daniil; Shelmanov, Artem; Fedulova, Irina; Dylov, Dmitry V.

doi:10.1038/s41598-023-31223-5

Cited by 30 publications

(10 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, descriptive radiologic findings (text data) were employed to generate the final diagnoses. Future researches may benefit from integrating image segmentation and image captioning AI models to produce descriptive radiologic findings, which can then serve as the basis for subsequent diagnostic inferences by ChatGPT [ 25 , 26 ]. Image captioning is the task of describing the visual content of an image in natural language, employing a visual understanding system and a language model capable of generating meaningful and syntactically correct sentences [ 27 ].…”

Section: Resultsmentioning

confidence: 99%

Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings

Hu,

Liu

et al. 2024

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Aim This study aimed to assess the performance of OpenAI’s ChatGPT in generating diagnosis based on chief complaint and cone beam computed tomography (CBCT) radiologic findings. Materials and methods 102 CBCT reports (48 with dental diseases (DD) and 54 with neoplastic/cystic diseases (N/CD)) were collected. ChatGPT was provided with chief complaint and CBCT radiologic findings. Diagnostic outputs from ChatGPT were scored based on five-point Likert scale. For diagnosis accuracy, the scoring was based on the accuracy of chief complaint related diagnosis and chief complaint unrelated diagnoses (1–5 points); for diagnosis completeness, the scoring was based on how many accurate diagnoses included in ChatGPT’s output for one case (1–5 points); for text quality, the scoring was based on how many text errors included in ChatGPT’s output for one case (1–5 points). For 54 N/CD cases, the consistence of the diagnosis generated by ChatGPT with pathological diagnosis was also calculated. The constitution of text errors in ChatGPT’s outputs was evaluated. Results After subjective ratings by expert reviewers on a five-point Likert scale, the final score of diagnosis accuracy, diagnosis completeness and text quality of ChatGPT was 3.7, 4.5 and 4.6 for the 102 cases. For diagnostic accuracy, it performed significantly better on N/CD (3.8/5) compared to DD (3.6/5). For 54 N/CD cases, 21(38.9%) cases have first diagnosis completely consistent with pathological diagnosis. No text errors were observed in 88.7% of all the 390 text items. Conclusion ChatGPT showed potential in generating radiographic diagnosis based on chief complaint and radiologic findings. However, the performance of ChatGPT varied with task complexity, necessitating professional oversight due to a certain error rate.

show abstract

Section: Resultsmentioning

confidence: 99%

Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings

Hu,

Liu

et al. 2024

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…For NLG metrics, ECG-GPT matches or outperforms most state-of-the-art medical image captioning models. [22][23][24] For ROUGE scores, which measure the overlap of word sequences between the generated and reference diagnosis statements, emphasizing recall, we report scores of 0.748 and 0.742 for ROUGE-1 and ROUGE-L, respectively. For BLEU scores, which focus on precision and assess the quality of modelgenerated statements, we report scores ranging from 0.619 for BLEU-1 to 0.472 for BLEU-4.…”

Section: Internal Testing -Nlg Agreementmentioning

confidence: 99%

Automated Diagnostic Reports from Images of Electrocardiograms at the Point-of-Care

Khunte,

Sangha,

Oikonomou

et al. 2024

Preprint

View full text Add to dashboard Cite

Timely and accurate assessment of electrocardiograms (ECGs) is crucial for diagnosing, triaging, and clinically managing patients. Current workflows rely on a computerized ECG interpretation using rule-based tools built into the ECG signal acquisition systems with limited accuracy and flexibility. In low-resource settings, specialists must review every single ECG for such decisions, as these computerized interpretations are not available. Additionally, high-quality interpretations are even more essential in such low-resource settings as there is a higher burden of accuracy for automated reads when access to experts is limited. Artificial Intelligence (AI)-based systems have the prospect of greater accuracy yet are frequently limited to a narrow range of conditions and do not replicate the full diagnostic range. Moreover, these models often require raw signal data, which are unavailable to physicians and necessitate costly technical integrations that are currently limited. To overcome these challenges, we developed and validated a format-independent vision encoder-decoder model - ECG-GPT - that can generate free-text, expert-level diagnosis statements directly from ECG images. The model shows robust performance, validated on 2.6 million ECGs across 6 geographically distinct health settings: (1) 2 large and diverse US health systems- Yale-New Haven and Mount Sinai Health Systems, (2) a consecutive ECG dataset from a central ECG repository from Minas Gerais, Brazil, (3) the prospective cohort study, UK Biobank, (4) a Germany-based, publicly available repository, PTB-XL, and (5) a community hospital in Missouri. The model demonstrated consistently high performance (AUROC≤0.81) across a wide range of rhythm and conduction disorders. This can be easily accessed via a web-based application capable of receiving ECG images and represents a scalable and accessible strategy for generating accurate, expert-level reports from images of ECGs, enabling accurate triage of patients globally, especially in low-resource settings.

show abstract

“…1,2 They have been used for a wide range of applications in health care, including predicting length of postsurgical hospital stay, captioning medical images, summarizing radiology reports, and named entity recognition of electronic health record notes. [3][4][5][6] Among these models, ChatGPT (OpenAI) has emerged as a particularly powerful tool based on GPT-3.5 that was designed specifically for the task of generating natural and contextually appropriate responses in a conversational setting. Building on the GPT-3 model, GPT-3.5 was trained on a larger corpus of textual data and with additional training techniques like Reinforcement Learning from Human Feedback (RLHF), which incorporates human knowledge and expertise into the model.…”

Section: Introductionmentioning

confidence: 99%

“…These models, including bidirectional encoder representations from transformers (BERT) and generative pretrained transformer 3 (GPT-3), are trained on massive amounts of text data and excel at natural language processing tasks such as text summarization or responding to queries . They have been used for a wide range of applications in health care, including predicting length of postsurgical hospital stay, captioning medical images, summarizing radiology reports, and named entity recognition of electronic health record notes …”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

Bernstein,

Zhang,

Govil

et al. 2023

JAMA Netw Open

View full text Add to dashboard Cite

ImportanceLarge language models (LLMs) like ChatGPT appear capable of performing a variety of tasks, including answering patient eye care questions, but have not yet been evaluated in direct comparison with ophthalmologists. It remains unclear whether LLM-generated advice is accurate, appropriate, and safe for eye patients.ObjectiveTo evaluate the quality of ophthalmology advice generated by an LLM chatbot in comparison with ophthalmologist-written advice.Design, Setting, and ParticipantsThis cross-sectional study used deidentified data from an online medical forum, in which patient questions received responses written by American Academy of Ophthalmology (AAO)–affiliated ophthalmologists. A masked panel of 8 board-certified ophthalmologists were asked to distinguish between answers generated by the ChatGPT chatbot and human answers. Posts were dated between 2007 and 2016; data were accessed January 2023 and analysis was performed between March and May 2023.Main Outcomes and MeasuresIdentification of chatbot and human answers on a 4-point scale (likely or definitely artificial intelligence [AI] vs likely or definitely human) and evaluation of responses for presence of incorrect information, alignment with perceived consensus in the medical community, likelihood to cause harm, and extent of harm.ResultsA total of 200 pairs of user questions and answers by AAO-affiliated ophthalmologists were evaluated. The mean (SD) accuracy for distinguishing between AI and human responses was 61.3% (9.7%). Of 800 evaluations of chatbot-written answers, 168 answers (21.0%) were marked as human-written, while 517 of 800 human-written answers (64.6%) were marked as AI-written. Compared with human answers, chatbot answers were more frequently rated as probably or definitely written by AI (prevalence ratio [PR], 1.72; 95% CI, 1.52-1.93). The likelihood of chatbot answers containing incorrect or inappropriate material was comparable with human answers (PR, 0.92; 95% CI, 0.77-1.10), and did not differ from human answers in terms of likelihood of harm (PR, 0.84; 95% CI, 0.67-1.07) nor extent of harm (PR, 0.99; 95% CI, 0.80-1.22).Conclusions and RelevanceIn this cross-sectional study of human-written and AI-generated responses to 200 eye care questions from an online advice forum, a chatbot appeared capable of responding to long user-written eye health posts and largely generated appropriate responses that did not differ significantly from ophthalmologist-written responses in terms of incorrect information, likelihood of harm, extent of harm, or deviation from ophthalmologist community standards. Additional research is needed to assess patient attitudes toward LLM-augmented ophthalmologists vs fully autonomous AI content generation, to evaluate clarity and acceptability of LLM-generated answers from the patient perspective, to test the performance of LLMs in a greater variety of clinical contexts, and to determine an optimal manner of utilizing LLMs that is ethical and minimizes harm.

show abstract

Medical image captioning via generative pretrained transformers

Cited by 30 publications

References 48 publications

Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings

Exploring the potential of ChatGPT as an adjunct for generating diagnosis based on chief complaint and cone beam CT radiologic findings

Automated Diagnostic Reports from Images of Electrocardiograms at the Point-of-Care

Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions

Contact Info

Product

Resources

About