2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00545
|View full text |Cite
|
Sign up to set email alerts
|

Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
91
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 162 publications
(109 citation statements)
references
References 39 publications
1
91
0
Order By: Relevance
“…Specifically, Ours-L achieves 69.2% and 70.6% mIoU on the PASCAL VOC val set with DeepLabV2 initialized with ImageNet and MS COCO pre-trained weights, respectively, which recover 90.7% and 91.0% of the upper bound of their fully-supervised counterparts. Our methods also achieve comparable performance with recent state-of-the-art WSSS methods us-ing extra saliency maps, such as NSROM (Yao et al, 2021), DRS (Kim et al, 2021), EPS (Lee et al, 2021c), AuxSegNet (Xu et al, 2021), and EDAM (Wu et al, 2021). Our method also outperforms recent methods with superior backbone networks, such as PMM (Li et al, 2021b), which uses Res2Net101 (Gao et al, 2021) as the backbone for semantic segmentation.…”
Section: Imagementioning
confidence: 57%
See 1 more Smart Citation
“…Specifically, Ours-L achieves 69.2% and 70.6% mIoU on the PASCAL VOC val set with DeepLabV2 initialized with ImageNet and MS COCO pre-trained weights, respectively, which recover 90.7% and 91.0% of the upper bound of their fully-supervised counterparts. Our methods also achieve comparable performance with recent state-of-the-art WSSS methods us-ing extra saliency maps, such as NSROM (Yao et al, 2021), DRS (Kim et al, 2021), EPS (Lee et al, 2021c), AuxSegNet (Xu et al, 2021), and EDAM (Wu et al, 2021). Our method also outperforms recent methods with superior backbone networks, such as PMM (Li et al, 2021b), which uses Res2Net101 (Gao et al, 2021) as the backbone for semantic segmentation.…”
Section: Imagementioning
confidence: 57%
“…MS COCO 2014 dataset (Lin et al, 2014) is a largescale dataset with 81 semantic categories, including the background class. After excluding the images without annotations (Lee et al, 2021c), the MS COCO dataset consists of 82,081 and 40,137 images in train and val set, respectively. Classification Network.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Image-level labels are probably the most popular form of weak supervision, due to their simplicity and the possibility of obtaining them from public datasets or web data. A typical WSSS pipeline begins with generating a pseudo mask, followed by training a new semantic segmentation network [24]. Interpretability techniques such as CAM [38] are often used to infer incomplete pixel-level annotations automatically.…”
Section: Related Workmentioning
confidence: 99%
“…As an additional guidance for network to pay attention to the entire region of objects, some existing works attempt to devise auxiliary tasks such as sub-category classification [3], self-equivariant regularization with scale variance minimization [49], class-wise co-attention extraction [33,42], anti-adversarial attack [28], and complementary patch loss [60]. Many WSSS methods [15,16,21,30,33,42,55,56] have been proposed to employ the pre-trained saliency detection module, which distinguishes dominant foreground object from its background, as a complementary source of information for enhancing CAMs and generating precise pseudo-pixel labels.…”
Section: Related Workmentioning
confidence: 99%