2022
DOI: 10.1109/tim.2022.3178991
|View full text |Cite
|
Sign up to set email alerts
|

DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
102
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 324 publications
(153 citation statements)
references
References 36 publications
0
102
0
1
Order By: Relevance
“…Our quantitative results on the CVC-ClinicDB dataset achieve SOTA performance compared to other models shown in Table Ⅱ. Our model achieves a mDice of 0.9523 which corresponds to a 1.01% improvement in mDice over the best-performing DS-TransUNet-L [11]. We achieve a mIoU of 0.9130 which corresponds to an improvement of 0.87% over SOTA MSRF-Net [37].…”
Section: ) Comparison On Kvasir-segmentioning
confidence: 64%
“…Our quantitative results on the CVC-ClinicDB dataset achieve SOTA performance compared to other models shown in Table Ⅱ. Our model achieves a mDice of 0.9523 which corresponds to a 1.01% improvement in mDice over the best-performing DS-TransUNet-L [11]. We achieve a mIoU of 0.9130 which corresponds to an improvement of 0.87% over SOTA MSRF-Net [37].…”
Section: ) Comparison On Kvasir-segmentioning
confidence: 64%
“…Lin et al [20] propose a dual-scale semantic segmentation model based on Swin Transformer to construct long-distance feature relationships between different scales using the self-attention mechanism. It validates the model on several medical datasets, which gains better results.…”
Section: Cnn-based Segmentation Networkmentioning
confidence: 99%
“…Typically, 3D CNNs are employed along the feature channels to estimate a probability function over different depth values [8,34]. Recently, shifted window Transformers were shown to enable local feature aggregation while maintaining long-range cross interaction, surpassing CNNs across different vision tasks [4,12,13,15,16]. Interestingly, while learned MVS methods aim to estimate the likelihood of depth hypotheses from multi-view feature consistency, they calculate the absolute error between ground truth and predicted depth expectation without geometrical consistency supervision [2,8,34].…”
Section: Introductionmentioning
confidence: 99%