2023
DOI: 10.1038/s41598-023-31223-5
|View full text |Cite
|
Sign up to set email alerts
|

Medical image captioning via generative pretrained transformers

Abstract: The proposed model for automatic clinical image caption generation combines the analysis of radiological scans with structured patient information from the textual records. It uses two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The generated textual summary contains essential information about pathologies found, their location, along with the 2D heatmaps that localize each pathology on the scans. The model has been tested on two medical dat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(10 citation statements)
references
References 48 publications
0
5
0
Order By: Relevance
“…Consequently, descriptive radiologic findings (text data) were employed to generate the final diagnoses. Future researches may benefit from integrating image segmentation and image captioning AI models to produce descriptive radiologic findings, which can then serve as the basis for subsequent diagnostic inferences by ChatGPT [ 25 , 26 ]. Image captioning is the task of describing the visual content of an image in natural language, employing a visual understanding system and a language model capable of generating meaningful and syntactically correct sentences [ 27 ].…”
Section: Resultsmentioning
confidence: 99%
“…Consequently, descriptive radiologic findings (text data) were employed to generate the final diagnoses. Future researches may benefit from integrating image segmentation and image captioning AI models to produce descriptive radiologic findings, which can then serve as the basis for subsequent diagnostic inferences by ChatGPT [ 25 , 26 ]. Image captioning is the task of describing the visual content of an image in natural language, employing a visual understanding system and a language model capable of generating meaningful and syntactically correct sentences [ 27 ].…”
Section: Resultsmentioning
confidence: 99%
“…For NLG metrics, ECG-GPT matches or outperforms most state-of-the-art medical image captioning models. [22][23][24] For ROUGE scores, which measure the overlap of word sequences between the generated and reference diagnosis statements, emphasizing recall, we report scores of 0.748 and 0.742 for ROUGE-1 and ROUGE-L, respectively. For BLEU scores, which focus on precision and assess the quality of modelgenerated statements, we report scores ranging from 0.619 for BLEU-1 to 0.472 for BLEU-4.…”
Section: Internal Testing -Nlg Agreementmentioning
confidence: 99%
“…1,2 They have been used for a wide range of applications in health care, including predicting length of postsurgical hospital stay, captioning medical images, summarizing radiology reports, and named entity recognition of electronic health record notes. [3][4][5][6] Among these models, ChatGPT (OpenAI) has emerged as a particularly powerful tool based on GPT-3.5 that was designed specifically for the task of generating natural and contextually appropriate responses in a conversational setting. Building on the GPT-3 model, GPT-3.5 was trained on a larger corpus of textual data and with additional training techniques like Reinforcement Learning from Human Feedback (RLHF), which incorporates human knowledge and expertise into the model.…”
Section: Introductionmentioning
confidence: 99%
“…These models, including bidirectional encoder representations from transformers (BERT) and generative pretrained transformer 3 (GPT-3), are trained on massive amounts of text data and excel at natural language processing tasks such as text summarization or responding to queries . They have been used for a wide range of applications in health care, including predicting length of postsurgical hospital stay, captioning medical images, summarizing radiology reports, and named entity recognition of electronic health record notes …”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation