2021
DOI: 10.48550/arxiv.2102.08005
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation

Abstract: U-Net based convolutional neural networks with deep feature representation and skip-connections have significantly boosted the performance of medical image segmentation. In this paper, we study the more challenging problem of improving efficiency in modeling global contexts without losing localization ability for low-level details. TransFuse, a novel two-branch architecture is proposed, which combines Transformers and CNNs in a parallel style. With TransFuse, both global dependency and low-level spatial detail… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
40
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(48 citation statements)
references
References 36 publications
0
40
0
Order By: Relevance
“…The overall architecture of TransUNet is similar to that of U-Net [27], where convnets act as feature extractors and transformers help encode the global context. In fact, a major feature of TransUNet and most of its followers [4,5,32,41] is to treat convnets as main bodies, on top of which transformers are further applied to capture long-term dependencies. However, such characteristic may cause a problem: advantages of transformers are not fully exploited.…”
Section: Introductionmentioning
confidence: 99%
“…The overall architecture of TransUNet is similar to that of U-Net [27], where convnets act as feature extractors and transformers help encode the global context. In fact, a major feature of TransUNet and most of its followers [4,5,32,41] is to treat convnets as main bodies, on top of which transformers are further applied to capture long-term dependencies. However, such characteristic may cause a problem: advantages of transformers are not fully exploited.…”
Section: Introductionmentioning
confidence: 99%
“…With skip-connection incorporated, TransUnet sets new records (at the time of publication) on synapse multi-organ segmentation dataset [156] and automated cardiac diagnosis challenge (ACDC) [155]. In other work, Zhang et al propose TransFuse [157] to effectively fuse features from the Transformer and CNN layers via BiFusion module. The BiFusion module leverages the self-attention and multi-modal fusion mechanism to selectively fuse the features.…”
Section: Hybrid Architecturesmentioning
confidence: 99%
“…Extensive evaluation of Trans-Fuse on multiple modalities (2D and 3D), including Polyp segmentation, skin lesion segmentation, Hip segmentation, and prostate segmentation, demonstrate its efficacy. Both TransUNet [96] and TransFuse [157] require pre-training on ImageNet dataset [158] to effectively learn the positional encoding of the images. To learn this positional bias without any pre-training, Valanarasu et al [128] propose a modified gated axial attention layer [159] that works well on small medical image segmentation datasets.…”
Section: Hybrid Architecturesmentioning
confidence: 99%
“…TransUNet [34] employs ViT as encoder with large-scale pre-training for medical image segmentation. TransFuse [35] combines Transformer and CNN in parallel to effectively capture global dependencies and spatial details in a much shallower manner. Besides, MedT [36] proposes a Gated Axial-Attention model which applies an additional control mechanism in the self-attention module to extend existing architectures.…”
Section: B Transformer-based Approaches 1) Transformer For Various Vi...mentioning
confidence: 99%