Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation

Lin, Fanchao; Xie, Hongtao; Liu, Chuanbin; Zhang, Yongdong

doi:10.1109/tcsvt.2021.3127562

Cited by 13 publications

(4 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the phenomenon of inconsistent segmentation results inside the object is easy to occur. Motivated by knowledge distillation methods [26]- [28], [60], [61], which are able to increase the accuracy of multi tasks via transferring the specific knowledge from each single task or increase the accuracy of small student network via transferring context information from large teacher model, a consistent constraint module is proposed to enrich the long-range dependency among pixels within non-key frame by distilling it from the feature obtained from key frame segmentation. It could promote semantic consistency of nonkey frame, and do not add any computing burden at the same time.…”

Section: B Local Attention Based Modulementioning

confidence: 99%

See 1 more Smart Citation

Dual Correlation Network for Efficient Video Semantic Segmentation

Liao

et al. 2024

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Video data bring a big challenge to semantic segmentation due to the large volume of data and strong interframe redundancy. In this paper, we propose a dual local and global correlation network tailored for efficient video semantic segmentation. It consists of three modules: 1) a local attention based module, which measures correlation and achieves feature aggregation in a local region between key frame and non-key frame; 2) a consistent constraint module, which considers longrange correlation among pixels from a global view for promoting intra-frame semantic consistency of non-key frame; and 3) a key frame decision module, which selects key frames adaptively based on the ability of feature transferring. Extensive experiments on the Cityscapes and Camvid video datasets demonstrate that our proposed method could reduce inference time significantly while maintaining high accuracy. The implementation is available at https://github.com/An01168/DCNVSS.

show abstract

Section: B Local Attention Based Modulementioning

confidence: 99%

“…It could promote semantic consistency of nonkey frame, and do not add any computing burden at the same time. Moreover, with the aim of transferring context knowledge, instead of aligning feature maps directly [61], we use the pair-wise similarity among pixels as knowledge.…”

Section: B Local Attention Based Modulementioning

confidence: 99%

Dual Correlation Network for Efficient Video Semantic Segmentation

Liao

et al. 2024

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Zhao et al [67] proposed the first weakly supervised video salient object detection model based on "fixation guided scribble annotations". And some methods used weakly-supervised approaches to video object segmentation by fusing information between different frames [68]- [70]. In contrast, Zhou et al [71] relied only on the current frame image and the corresponding optical flow data to achieve the zero-shot video object segmentation.…”

Section: B Weakly Supervised Salient Object Detectionmentioning

confidence: 99%

A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels

Cong

Qin

Zhang

et al. 2023

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

“…With the recent success of deep learning, deep metric learning (DML) methods have demonstrated strong ability in various tasks (Ge et al 2021;Liu et al 2021;Peng et al 2021;Lin et al 2021;Wang et al 2020b), such as semantic search (Huang et al 2020;Li et al 2021b;Min et al 2020a,b) and face recognition (Li et al 2021a). Most existing approaches (Roth, Brattoli, and Ommer 2019;Wu et al 2017) take as input a sample (e.g., an image or a document), use a trained neural network as an encoder and represent this sample with the output embedding.…”

Section: Introductionmentioning

confidence: 99%

Neighborhood-Adaptive Structure Augmented Metric Learning

Li²,

Xie

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

Most metric learning techniques typically focus on sample embedding learning, while implicitly assume a homogeneous local neighborhood around each sample, based on the metrics used in training ( e.g., hypersphere for Euclidean distance or unit hyperspherical crown for cosine distance). As real-world data often lies on a low-dimensional manifold curved in a high-dimensional space, it is unlikely that everywhere of the manifold shares the same local structures in the input space. Besides, considering the non-linearity of neural networks, the local structure in the output embedding space may not be homogeneous as assumed. Therefore, representing each sample simply with its embedding while ignoring its individual neighborhood structure would have limitations in Embedding-Based Retrieval (EBR). By exploiting the heterogeneity of local structures in the embedding space, we propose a Neighborhood-Adaptive Structure Augmented metric learning framework (NASA), where the neighborhood structure is realized as a structure embedding, and learned along with the sample embedding in a self-supervised manner. In this way, without any modifications, most indexing techniques can be used to support large-scale EBR with NASA embeddings. Experiments on six standard benchmarks with two kinds of embeddings, i.e., binary embeddings and real-valued embeddings, show that our method significantly improves and outperforms the state-of-the-art methods.

show abstract

Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation

Cited by 13 publications

References 64 publications

Dual Correlation Network for Efficient Video Semantic Segmentation

Dual Correlation Network for Efficient Video Semantic Segmentation

A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels

Neighborhood-Adaptive Structure Augmented Metric Learning

Contact Info

Product

Resources

About