Relational-Convergent Transformer for image captioning

Yang, You; Chen, Lizhi; Hu, Juntao; Pan, Longyue; Zhai, Hao

doi:10.1016/j.displa.2023.102377

Cited by 8 publications

(1 citation statement)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early works [22] include developing models based on long short-term memory (LSTM [23]) [14,24,25]and convolutional neural network (CNN) models. Recently, transformer [26] model-based attention modules [13,27,28] have been widely used due to their outstanding ability to handle visual and language features. Although the performance is attractive, and the image captioning task is somewhat similar to the radiology report generation task, these excellent methods in the general image captioning task have limited applicability to the radiology report generation task [29,30].…”

Section: Image Captioningmentioning

confidence: 99%

Reinforced Visual Interaction Fusion Radiology Report Generation

Wang,

Chen,

Liu

et al. 2024

Preprint

View full text Add to dashboard Cite

The explosion in the number of more complex types of chest X-rays and CT scans in recent years has placed a significant workload on physicians, particularly in radiology departments, to interpret and produce radiology reports. There is therefore a need for more efficient generation of medical reports. In this paper, we propose the Reinforced Visual Interaction Fusion (RVIF) radiology report generation model, which adopts a novel and effective visual interaction fusion module, which is more conducive to extracting fused visual features of radiology images with clinical diagnostic significance and performing subsequent correlation. Sexual analysis and processing. In addition, a reinforcement learning step from image captioning to this task is introduced to further enhance the aligned diagnosis effect brought by the visual interactive fusion module to generate accurate and highly credible radiology reports. Quantitative experiments and visualization results prove that our model performs well on two public medical report generation datasets, IU X-Ray, and MIMIC-CXR, surpassing some SOTA methods. Compared with the SOTA model COMG+RL in 2024, the BLEU@1, 2, and 3 of the NLG metrics increased by 3.9%, 2.8%, and 0.5% respectively, METEOR increased by 2.2%, the precision P of the CE index increased by 0.4%, and the recall rate R increased by 1.5%, F1-score increased by 1.8%. Source code in https://github.com/200084/RVIF-Radiology-Report-Generation.

show abstract