2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01649
|View full text |Cite
|
Sign up to set email alerts
|

Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
58
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 103 publications
(60 citation statements)
references
References 35 publications
0
58
0
Order By: Relevance
“…Specifically, Ours-L achieves 69.2% and 70.6% mIoU on the PASCAL VOC val set with DeepLabV2 initialized with ImageNet and MS COCO pre-trained weights, respectively, which recover 90.7% and 91.0% of the upper bound of their fully-supervised counterparts. Our methods also achieve comparable performance with recent state-of-the-art WSSS methods us-ing extra saliency maps, such as NSROM (Yao et al, 2021), DRS (Kim et al, 2021), EPS (Lee et al, 2021c), AuxSegNet (Xu et al, 2021), and EDAM (Wu et al, 2021). Our method also outperforms recent methods with superior backbone networks, such as PMM (Li et al, 2021b), which uses Res2Net101 (Gao et al, 2021) as the backbone for semantic segmentation.…”
Section: Imagementioning
confidence: 57%
“…Specifically, Ours-L achieves 69.2% and 70.6% mIoU on the PASCAL VOC val set with DeepLabV2 initialized with ImageNet and MS COCO pre-trained weights, respectively, which recover 90.7% and 91.0% of the upper bound of their fully-supervised counterparts. Our methods also achieve comparable performance with recent state-of-the-art WSSS methods us-ing extra saliency maps, such as NSROM (Yao et al, 2021), DRS (Kim et al, 2021), EPS (Lee et al, 2021c), AuxSegNet (Xu et al, 2021), and EDAM (Wu et al, 2021). Our method also outperforms recent methods with superior backbone networks, such as PMM (Li et al, 2021b), which uses Res2Net101 (Gao et al, 2021) as the backbone for semantic segmentation.…”
Section: Imagementioning
confidence: 57%
“…To make a fair comparison, we follow SEAM [31], PuzzleCAM [13], and AdvCAM [18] to adopt PSA [2] for initial CAM refinement. [28] V1 ‡ V16 66.2 66.9 LIID TPAMI'21 [21] V2 R50 66.5 67.5 NSROM CVPR'21 [35] V2 ‡ V16 68.3 68.5 DRS AAAI'21 [14] V2 ‡ V16 70.4 70.7 EPS CVPR'21 [19] V2 ‡ WR38 70.9 70.8 EDAM CVPR'21 [32] V2 ‡ WR38 70.9 70.6 AuxSegNet ICCV'21 [34] WR38 -69.0 68.6…”
Section: Methodsmentioning
confidence: 99%
“…This semantic affinity is then applied to refine the generated initial CAMs as pseudo ground-truth masks. Previous works [12,19,21,32] instead use additional saliency maps from a fully supervised saliency detector to refine the generated initial CAMs. The series of DeepLab [5,6] models are typically used to train a semantic segmentation network with the pseudo ground-truth masks.…”
Section: Related Workmentioning
confidence: 99%
“…Several studies exploit region masks for pseudo-labels for semantic segmentation [47,52,59]; however, the proposed method is advantageous in terms of the quality and the stability of the self-supervision. Instead of using fixed sources of regional information such as the off-the-shelf saliency module [47] or pre-trained classifier [36], we obtain region masks from the CAMs of SupportNet.…”
Section: Regional Contrastive Module (Rcm)mentioning
confidence: 99%
“…Moreover, we separate the network providing self-supervision for object localization (SupportNet) and the network learning from that guidance (MainNet), using EMA [18]. This enables more stable training than the methods of which backbone is updated by self-supervision from itself [52,59]. Therefore, the acquired self-supervision is not only continually revised as training proceeds but also stably delivered to the MainNet.…”
Section: Regional Contrastive Module (Rcm)mentioning
confidence: 99%