Progressive Transformer-Based Generation of Radiology Reports

Nooralahzadeh, Farhad; Gonzalez, Nicolas Perez; Frauenfelder, Thomas; Fujimoto, Koji; Krauthammer, Michael

doi:10.18653/v1/2021.findings-emnlp.241

Cited by 31 publications

(10 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LLMs have demonstrated significant performance gains for medical problem summarization tasks [ 42 ]. We note preliminary reports of the use of LLMs to generate radiological reports from images [ 43 ]. These examples demonstrate the potential for use cases for medical charting assistance.…”

Section: Discussionmentioning

confidence: 99%

Potential of Large Language Models in Health Care: Delphi Study

Denecke,

May,

Rivera Romero

2024

J Med Internet Res

View full text Add to dashboard Cite

Background A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. Objective The aim of this adapted Delphi study was to collect researchers’ opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. Methods We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. Results The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. Conclusions Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.

show abstract

Section: Discussionmentioning

confidence: 99%

Potential of Large Language Models in Health Care: Delphi Study

Denecke,

May,

Rivera Romero

2024

J Med Internet Res

View full text Add to dashboard Cite

show abstract

“…They proposed a relational memory (RM) module to retain knowledge from previous cases, thereby enabling the generator model to remember similar reports when generating current reports. Another study ( 7 ) proposed a progressive Transformer-based framework for report generation, which generates high-level context from the given x-ray and then employs the Transformer architecture to convert this context into a radiology report. This model comprises a pre-trained CNN as a visual backbone, a mesh-memory Transformer ( 17 ) as a visual-language model, and BART ( 21 ) as a language model.…”

Section: Relevant Literaturementioning

confidence: 99%

“…Our BLEU-1 results exhibit strong concordance with existing report generation literature, which has established scoring norms averaging 0.3 to 0.4 for this metric. We conducted a comparative analysis of our model against relevant state-of-the-art models ( 23 , 24 , 7 , 22 ), referencing the results documented in their published literature. When considering ROUGE-L score, which reflects the model’s ability to capture document-level linguistic coherence, our approach excelled in this aspect, achieving a ROUGE-L score of 0.331, standing out as the highest score across all models.…”

Section: Experiments and Analysismentioning

confidence: 99%

“…In the field of medical imaging informatics, previous studies have developed techniques to automate the generation of radiology reports ( 1 , 2 ). The majority of current deep learning approaches use networks that feature a convolutional encoder and recurrent ( 3 – 5 ) or transformer decoder ( 6 , 7 ), which were originally designed for the task of image captioning ( Figure 1 ). Although these two tasks share similarities in terms of input and output modalities, there are some key differences.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Beyond images: an integrative multi-modal approach to chest x-ray report generation

Aksoy,

Sharoff,

Baser

et al. 2024

Front. Radiol.

View full text Add to dashboard Cite

Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest x-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes. We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model’s accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.

show abstract

“…First they identified the disease from the image using a CNN and then used prior knowledge for report generation. Nooralahzadeh et al [20] proposed a two-step model which derived global concepts from the image then reformed them into finer and coherent texts using a transformer architecture. You et al [22] proposed a AlignTransformer framework, Align Hierarchical Attention (AHA) and Multi-Grained Transformer (MGT) were the components of the AlignTransformer framework.…”

Section: Medical Report Generationmentioning

confidence: 99%

Vision Transformer and Language Model Based Radiology Report Generation

et al. 2023

View full text Add to dashboard Cite

Recent advancements in transformers exploited computer vision problems which results in state-of-the-art models. Transformer-based models in various sequence prediction tasks such as language translation, sentiment classification, and caption generation have shown remarkable performance. Auto report generation scenarios in medical imaging through caption generation models is one of the applied scenarios for language models and have strong social impact. In these models, convolution neural networks have been used as encoder to gain spatial information and recurrent neural networks are used as decoder to generate caption or medical report. However, using transformer architecture as encoder and decoder in caption or report writing task is still unexplored. In this research, we explored the effect of losing spatial biasness information in encoder by using pre-trained vanilla image transformer architecture and combine it with different pre-trained language transformers as decoder. In order to evaluate the proposed methodology, the Indiana University Chest X-Rays dataset is used where ablation study is also conducted with respect to different evaluations. The comparative analysis shows that the proposed methodology has represented remarkable performance when compared with existing techniques in terms of different performance parameters.

show abstract

Progressive Transformer-Based Generation of Radiology Reports

Cited by 31 publications

References 18 publications

Potential of Large Language Models in Health Care: Delphi Study

Potential of Large Language Models in Health Care: Delphi Study

Beyond images: an integrative multi-modal approach to chest x-ray report generation

Vision Transformer and Language Model Based Radiology Report Generation

Contact Info

Product

Resources

About