DeepOpht: Medical Report Generation for Retinal Images via Deep Models and Visual Explanation

Huang, Jia-Hong; Yang, Chao-Han Huck; Liu, Fangyu; Tian, Meng; Liu, Yi-Chieh; Wu, Ting-Wei; Lin, I-Hung; Wang, Kang; Morikawa, Hiromasa; Chang, Herng-Hua; Tegnér, Jesper; Worring, Marcel

doi:10.1109/wacv48630.2021.00249

Cited by 31 publications

(37 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to [1,2], retinal diseases, e.g., Diabetic Retinopathy (DR) and Age-related Macular Degeneration (AMD) are expected to affect over 500 million people worldwide. So, the workload of ophthalmologists will be overwhelming.…”

Section: Introductionmentioning

confidence: 99%

“…So, the workload of ophthalmologists will be overwhelming. Automating part of the retinal disease diagnosis procedure [1], such as medical report generation for retinal images, is one of the good ways to help them reduce the workload.…”

Section: Introductionmentioning

confidence: 99%

“…These proposed approaches work on image content only because they are mainly based on traditional natural image captioning models [8,9]. However, it is hard to generate abstract medical concepts or descriptions, [3,4], i.e., key components of medical reports [1], only based on image information. To address this issue, the authors of [1,10] have proposed a context-driven, i.e., in the form of keywords sequence, medical report generation method for retinal images.…”

Section: Introductionmentioning

confidence: 99%

“…However, it is hard to generate abstract medical concepts or descriptions, [3,4], i.e., key components of medical reports [1], only based on image information. To address this issue, the authors of [1,10] have proposed a context-driven, i.e., in the form of keywords sequence, medical report generation method for retinal images. Since the context-driven method has multi-modal inputs, i.e., the keywords and image, the authors of [1] exploit the average method to fuse the multi-modal information.…”

Section: Introductionmentioning

confidence: 99%

“…To address this issue, the authors of [1,10] have proposed a context-driven, i.e., in the form of keywords sequence, medical report generation method for retinal images. Since the context-driven method has multi-modal inputs, i.e., the keywords and image, the authors of [1] exploit the average method to fuse the multi-modal information. However, fusing the multi-modal information by the aver-age method in this case probably cannot effectively capture the interactive information between the context and image [7,6,11,12,13,14,15,16,17,18,19,20,21,22,23,1].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

Huang,

Wu,

Yang

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency. In this work, we propose a new context-driven encoding network to automatically generate medical reports for retinal images. The proposed model is mainly composed of a multi-modal input encoder and a fused-feature decoder. Our experimental results show that our proposed method is capable of effectively leveraging the interactive information between the input image and context, i.e., keywords in our case. The proposed method creates more accurate and meaningful reports for retinal images than baseline models and achieves state-of-the-art performance. This performance is shown in several commonly used metrics for the medical report generation task: BLEUavg (+16%), CIDEr (+10.2%), and ROUGE (+8.6%).

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

Huang,

Wu,

Yang

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Recent advances of Transformers in medical image analysis: A comprehensive review

Xia

Wang

2023

MedComm – Future Medicine

View full text Add to dashboard Cite

Recent works have shown that Transformer's excellent performances on natural language processing tasks can be maintained on natural image analysis tasks. However, the complicated clinical settings in medical image analysis and varied disease properties bring new challenges for the use of Transformer. The computer vision and medical engineering communities have devoted significant effort to medical image analysis research based on Transformer with especial focus on scenario-specific architectural variations.In this paper, we comprehensively review this rapidly developing area by covering the latest advances of Transformer-based methods in medical image analysis of different settings. We first give introduction of basic mechanisms of Transformer including implementations of selfattention and typical architectures. The important research problems in various medical image data modalities, clinical visual tasks, organs and diseases are then reviewed systemically. We carefully collect 276 very recent works and 76 public medical image analysis datasets in an organized structure. Finally, discussions on open problems and future research directions are also provided. We expect this review to be an up-to-date roadmap and serve as a reference source in pursuit of boosting the development of medical image analysis field.

show abstract