Exploring Pixel-level Self-supervision for Weakly Supervised Semantic Segmentation

Yoon, Sung-Hoon; Kweon, Hyeokjun; Jeong, Jaeseok; Kim, Hyeonseong; Kim, Shinjeong; Yoon, Kuk-Jin

doi:10.48550/arxiv.2112.05351

Cited by 1 publication

(3 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the above steps, we obtain a CAM generative model trained by source samples and image-level classification labels, after which the conventional two post-processing steps are followed: (1) CAM regions are selected as seed regions by threshold [11]. (2) Expand it as the final pseudo-label [18]. And its visualization results are shown in Experimental Results and Disscusion.…”

Section: Lðyþmentioning

confidence: 99%

“…Also, class-activated mapping (CAM) [16] is an effective solution to generate pixel-level pseudo-labels through image-level classification labels. However, due to the discriminant mode of the classifiers [17,18], and these labels contain limited spatial details [19], that often leads to the local activation regions [20], and the segmented object boundaries easily involve false activation. They thus will cause different degrees of fragmentary masks [21].…”

Section: Introductionmentioning

confidence: 99%

“…They thus will cause different degrees of fragmentary masks [21]. A lot of recent work has refined the quality of CAM by mining more semantic and object location information from limited annotation information [10,11,18]. The success of these methods depends on the long-range dependencies [22] between pixels in an image.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Activation extending based on long-range dependencies for weakly supervised semantic segmentation

Liu,

Zhao,

Wang

et al. 2023

PLoS ONE

View full text Add to dashboard Cite

Weakly supervised semantic segmentation (WSSS) principally obtains pseudo-labels based on the class activation maps (CAM) to handle expensive annotation resources. However, CAM easily involves false and local activation due to the the lack of annotation information. This paper suggests weakly supervised learning as semantic information mining to extend object mask. We proposes a novel architecture to mining semantic information by modeling through long-range dependencies from in-sample and inter-sample. Considering the confusion caused by the long-range dependencies, the images are divided into blocks and carried out self-attention operation on the premise of fewer classes to obtain long-range dependencies, to reduce false predictions. Moreover, we perform global to local weighted self-supervised contrastive learning among image blocks, and the local activation of CAM is transferred to different foreground area. Experiments verified that superior semantic details and more reliable pseudo-labels are captured through these suggested modules. Experiments on PASCAL VOC 2012 demonstrated the proposed model achieves 76.6% and 77.4% mIoU in val and test sets, which is superior to the comparison baselines.

show abstract

Section: Lðyþmentioning

confidence: 99%