DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation

Lin, Ailiang; Chen, Bingzhi; Xu, Jiayu; Zheng, Zhu-Jun; Lu, Yao

doi:10.1109/tim.2022.3178991

Cited by 324 publications

(153 citation statements)

References 36 publications

Supporting

Mentioning

102

Contrasting

Unclassified

Order By: Relevance

“…Our quantitative results on the CVC-ClinicDB dataset achieve SOTA performance compared to other models shown in Table Ⅱ. Our model achieves a mDice of 0.9523 which corresponds to a 1.01% improvement in mDice over the best-performing DS-TransUNet-L [11]. We achieve a mIoU of 0.9130 which corresponds to an improvement of 0.87% over SOTA MSRF-Net [37].…”

Section: ) Comparison On Kvasir-segmentioning

confidence: 64%

EG-TransUNet: Enhanced and Guided U-Net with Transformer for Biomedical Image Segmentation

Pan

Liu

Xie

et al. 2022

Preprint

View full text Add to dashboard Cite

Although methods based on convolutional neural network have improved the performance of biomedical image segmentation, to meet the precision requirements of medical imaging segmentation task, the medical image segmentation method based on the deep learning still need to solve the following problems: 1) It is difficult to extract the discriminative feature of the lesion region in medical images during the encoding caused by variable sizes and shapes; 2) It is difficult to fuse the spatial and semantic information of the lesion region effectively during the decoding caused by redundant information and semantic gap. In this paper, we propose a novel U-Net variant architecture called EG-TransUNet, which is able to improve the feature discrimination at the level of spatial detail and semantic location using progressive enhancement module (PEM) and channel spatial attention (CSA) based on self-attention and effectively fuse the spatial and semantic information using semantic guidance attention (SGA). The proposed EG-TransUNet allows to capture object variabilities and provides improved results on different biomedical datasets. Extensive experiments on EG-TransUNet demonstrate that the method advances the performance on five publicly available segmentation datasets, and also, is more generalizable as compared to state-of-the-art methods.

show abstract

Section: ) Comparison On Kvasir-segmentioning

confidence: 64%

EG-TransUNet: Enhanced and Guided U-Net with Transformer for Biomedical Image Segmentation

Pan

Liu

Xie

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Lin et al [20] propose a dual-scale semantic segmentation model based on Swin Transformer to construct long-distance feature relationships between different scales using the self-attention mechanism. It validates the model on several medical datasets, which gains better results.…”

Section: Cnn-based Segmentation Networkmentioning

confidence: 99%

CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation

et al. 2022

View full text Add to dashboard Cite

Medical image segmentation techniques are vital to medical image processing and analysis. Considering the significant clinical applications of brain tumour image segmentation, it represents a focal point of medical image segmentation research. Most of the work in recent times has been centred on Convolutional Neural Networks (CNN) and Transformers. However, CNN has some deficiencies in modelling long-distance information transfer and contextual processing information, while Transformer is relatively weak in acquiring local information. To overcome the above defects, we propose a novel segmentation network with an “encoder–decoder” architecture, namely CSU-Net. The encoder consists of two parallel feature extraction branches based on CNN and Transformer, respectively, in which the features of the same size are fused. The decoder has a dual Swin Transformer decoder block with two learnable parameters for feature upsampling. The features from multiple resolutions in the encoder and decoder are merged via skip connections. On the BraTS 2020, our model achieves 0.8927, 0.8857, and 0.8188 for the Whole Tumour (WT), Tumour Core (TC), and Enhancing Tumour (ET), respectively, in terms of Dice scores.

show abstract

“…Typically, 3D CNNs are employed along the feature channels to estimate a probability function over different depth values [8,34]. Recently, shifted window Transformers were shown to enable local feature aggregation while maintaining long-range cross interaction, surpassing CNNs across different vision tasks [4,12,13,15,16]. Interestingly, while learned MVS methods aim to estimate the likelihood of depth hypotheses from multi-view feature consistency, they calculate the absolute error between ground truth and predicted depth expectation without geometrical consistency supervision [2,8,34].…”

Section: Introductionmentioning

confidence: 99%

WT-MVSNet: Window-based Transformers for Multi-view Stereo

Liao¹,

Ding²,

Shavit³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, Transformers were shown to enhance the performance of multi-view stereo by enabling long-range feature interaction. In this work, we propose Windowbased Transformers (WT) for local feature matching and global feature aggregation in multi-view stereo. We introduce a Window-based Epipolar Transformer (WET) which reduces matching redundancy by using epipolar constraints. Since pointto-line matching is sensitive to erroneous camera pose and calibration, we match windows near the epipolar lines. A second Shifted WT is employed for aggregating global information within cost volume. We present a novel Cost Transformer (CT) to replace 3D convolutions for cost volume regularization. In order to better constrain the estimated depth maps from multiple views, we further design a novel geometric consistency loss (Geo Loss) which punishes unreliable areas where multi-view consistency is not satisfied. Our WT multi-view stereo method (WT-MVSNet) achieves state-of-the-art performance across multiple datasets and ranks 1 st on Tanks and Temples benchmark.

show abstract

DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation

Cited by 324 publications

References 36 publications

EG-TransUNet: Enhanced and Guided U-Net with Transformer for Biomedical Image Segmentation

EG-TransUNet: Enhanced and Guided U-Net with Transformer for Biomedical Image Segmentation

CSU-Net: A CNN-Transformer Parallel Network for Multimodal Brain Tumour Segmentation

WT-MVSNet: Window-based Transformers for Multi-view Stereo

Contact Info

Product

Resources

About