Qihang Yu scite author profile

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the ushaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures with innate global self-attention mechanisms, but can result in limited localization abilities due to insufficient low-level details. In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. On one hand, the Transformer encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. On the other hand, the decoder upsamples the encoded features which are then combined with the high-resolution CNN feature maps to enable precise localization. We argue that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information. TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation. Code and models are available at https://github.com/Beckschen/ TransUNet.

show abstract

Recurrent Saliency Transformation Network: Incorporating Multi-stage Visual Cues for Small Organ Segmentation

Xie

Wang

et al. 2018

196

201

View full text Add to dashboard Cite

We aim at segmenting small organs (e.g., the pancreas) from abdominal CT scans. As the target often occupies a relatively small region in the input image, deep neural networks can be easily confused by the complex and variable background. To alleviate this, researchers proposed a coarse-to-fine approach [46], which used prediction from the first (coarse) stage to indicate a smaller input region for the second (fine) stage. Despite its effectiveness, this algorithm dealt with two stages individually, which lacked optimizing a global energy function, and limited its ability to incorporate multi-stage visual cues. Missing contextual information led to unsatisfying convergence in iterations, and that the fine stage sometimes produced even lower segmentation accuracy than the coarse stage.This paper presents a Recurrent Saliency Transformation Network. The key innovation is a saliency transformation module, which repeatedly converts the segmentation probability map from the previous iteration as spatial weights and applies these weights to the current iteration. This brings us two-fold benefits. In training, it allows joint optimization over the deep networks dealing with different input scales. In testing, it propagates multi-stage visual information throughout iterations to improve segmentation accuracy. Experiments in the NIH pancreas segmentation dataset demonstrate the state-of-the-art accuracy, which outperforms the previous best by an average of over 2%. Much higher accuracies are also reported on several small organs in a larger dataset collected by ourselves. In addition, our approach enjoys better convergence properties, making it more efficient and reliable in practice.

show abstract

When Radiology Report Generation Meets Knowledge Graph

Zhang

Wang

et al. 2020

AAAI

145

View full text Add to dashboard Cite

Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.

show abstract

C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

Yang²,

Roth³

et al. 2020

123

View full text Add to dashboard Cite

Mask Guided Matting via Progressive Refinement Network

Zhang

et al. 2021

View full text Add to dashboard Cite

Application of Deep Learning to Pancreatic Cancer Detection: Lessons Learned From Our Initial Experience

Chu

Park

Kawamoto

et al. 2019

Journal of the American College of Radiology

View full text Add to dashboard Cite

Neural Architecture Search for Lightweight Non-Local Networks

Jin

Mei

et al. 2020

View full text Add to dashboard Cite

When Radiology Report Generation Meets Knowledge Graph

Zhang

Wang

et al. 2020

Preprint

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qihang Yu

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Recurrent Saliency Transformation Network: Incorporating Multi-stage Visual Cues for Small Organ Segmentation

When Radiology Report Generation Meets Knowledge Graph

C2FNAS: Coarse-to-Fine Neural Architecture Search for 3D Medical Image Segmentation

Mask Guided Matting via Progressive Refinement Network

Application of Deep Learning to Pancreatic Cancer Detection: Lessons Learned From Our Initial Experience

Neural Architecture Search for Lightweight Non-Local Networks

When Radiology Report Generation Meets Knowledge Graph

Contact Info

Product

Resources

About