2018
DOI: 10.1007/978-3-030-00928-1_52
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Recurrent Model with Attention for Automated Radiology Report Generation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
98
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 128 publications
(99 citation statements)
references
References 16 publications
0
98
0
1
Order By: Relevance
“…Chest Radiographic Observations: The task is formulated as a multi-label classification with 14 common radiographic observations following [5] including: enlarged cardiom, cardiomegaly, lung opacity, lung lesion, edema, consolidation, pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture, support devices, and no finding. Compared with previous studies using pretrained encoders based on ImageNet [6,14], pretraining with images from the same domain yields better results. We add one full-connected layer as classifier and compute the binary cross entropy (BCE) loss.…”
Section: Image Encodermentioning
confidence: 64%
See 2 more Smart Citations
“…Chest Radiographic Observations: The task is formulated as a multi-label classification with 14 common radiographic observations following [5] including: enlarged cardiom, cardiomegaly, lung opacity, lung lesion, edema, consolidation, pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture, support devices, and no finding. Compared with previous studies using pretrained encoders based on ImageNet [6,14], pretraining with images from the same domain yields better results. We add one full-connected layer as classifier and compute the binary cross entropy (BCE) loss.…”
Section: Image Encodermentioning
confidence: 64%
“…Radiology Report Generation: The evaluation metrics we use are BLEU [9], METEOR [2], and ROUGE [8] scores, all of which are widely used in image captioning and machine translation tasks. We compared the proposed model with several state-of-the-art baselines: (1) a visual attention based image captioning model (Vis-Att) [13]; (2) radiology report generation models, including a hierarchical decoder with co-attention (Co-Att) [6], multimodal generative model with visual attention (MM-Att) [14], and knowledge-drive retrieval based report generation (KERP) [7]; and (3) the proposed multi-view encoder with hierarchical decoder (MvH) model, the base model with visual attentions and early fusion (MvH+AttE), MvH with late fusion fashion (MvH+AttL), and the combination of late fusion with medical concepts (MvH+AttL+MC). MvH+AttL+MC* is an oracle run based on ground-truth medical concepts and considered as the upper bound of the improvement caused by applying medical concepts.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Typically this involves conditioning a recurrent neural network (RNN) on image features encoded by a convolutional neural network (CNN). This method has shown great promise in non-specific image captioning tasks but has not generalized well to the complex domain of medical images [18,19].…”
Section: Introductionmentioning
confidence: 99%
“…For example, a pathologist must interpret a set of 8 different renal biopsy sections of the same patient to report the RDIF assay. Several methods have been proposed to enable captioning of multiple images by assuming images in the set exhibit temporal dependence [4] or contain multiple views/instances of the same object [18]. These assumptions are unsuitable for the multi-object temporally independent RDIF set.…”
Section: Introductionmentioning
confidence: 99%