Fast target-aware learning for few-shot video object segmentation

Chen, Yadang; Hao, Chuanyan; Yang, Zhi-Xin; Wu, Enhua

doi:10.1007/s11432-021-3396-7

Cited by 15 publications

(3 citation statements)

References 44 publications

(105 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent research in matching-based video object segmentation (VOS) [1][2][3][4][5][6][7][8][9][17][18][19][20][21][22][23] has primarily focused on improving network structures, including building more efficient memory [5,6], adopting local matching [7][8][9], and incorporating background context [3,4]. Among them, STM [1] is the pioneer in the field of video object segmentation, which first proposed the design of the memory bank, enabling VOS methods to effectively utilize the temporal information in video sequences.…”

Section: Matching-based Vos Networkmentioning

confidence: 99%

Mitigating Distractor Challenges in Video Object Segmentation through Shape and Motion Cues

Peng,

Zhao,

Zhang

et al. 2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

The purpose of semi-supervised video object segmentation (VOS) is to predict and generate object masks in subsequent video frames after being provided with the initial frame’s object mask. Currently, mainstream methods leverage historical frame information for enhancing the network’s performance. However, this approach faces the following issues: (1) They often overlook important shape information, leading to decreased accuracy in segmenting object-edge areas. (2) They often use pixel-level motion estimation to guide the matching for addressing distractor objects. However, this brings heavy computation costs and struggle against occlusion or fast/blurry motion. For the first problem, this paper introduces an object shape extraction module that exploits both the high-level and low-level features to obtain object shape information, by which the shape information can be used to further refine the predicted masks. For the second problem, this paper introduces a novel object-level motion prediction module, in which it stores the representative motion features during the training stage, and predicts the object motion by retrieving them during the inference stage. We evaluate our method on benchmark datasets compared with recent state-of-the-art methods, and the results demonstrate the effectiveness of the proposed method.

show abstract

Section: Matching-based Vos Networkmentioning

confidence: 99%

Mitigating Distractor Challenges in Video Object Segmentation through Shape and Motion Cues

Peng,

Zhao,

Zhang

et al. 2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…Convolutional Neural Networks (CNNs) [11,12,18,19] have long been recognized as powerful tools for image analysis due to their ability to learn hierarchical feature representations from raw image data. They have been particularly successful in various segmentation tasks, thanks to their robustness in extracting spatial features from images.…”

Section: Cnn-based Remote Sensing Image Segmentationmentioning

confidence: 99%

“…The self-attention mechanism computes the response at a position as a weighted sum of the features at all positions in the data. This global context-awareness allows the Transformer to better capture intricate spatial structures and long-range dependencies that are characteristic of remote sensing images [19,[29][30][31].…”

Section: Remote Sensing Image Segmentation Based On Self-attention Me...mentioning

confidence: 99%

Spatial-Aware Transformer (SAT): Enhancing Global Modeling in Transformer Segmentation for Remote Sensing Images

et al. 2023

Self Cite

View full text Add to dashboard Cite

In this research, we present the Spatial-Aware Transformer (SAT), an enhanced implementation of the Swin Transformer module, purposed to augment the global modeling capabilities of existing transformer segmentation mechanisms within remote sensing. The current landscape of transformer segmentation techniques is encumbered by an inability to effectively model global dependencies, a deficiency that is especially pronounced in the context of occluded objects. Our innovative solution embeds spatial information into the Swin Transformer block, facilitating the creation of pixel-level correlations, and thereby significantly elevating the feature representation potency for occluded subjects. We have incorporated a boundary-aware module into our decoder to mitigate the commonly encountered shortcoming of inaccurate boundary segmentation. This component serves as an innovative refinement instrument, fortifying the precision of boundary demarcation. After these strategic enhancements, the Spatial-Aware Transformer achieved state-of-the-art performance benchmarks on the Potsdam, Vaihingen, and Aerial datasets, demonstrating its superior capabilities in recognizing occluded objects and distinguishing unique features, even under challenging conditions. This investigation constitutes a significant advancement toward optimizing transformer segmentation algorithms in remote sensing, opening a wealth of opportunities for future research and development.

show abstract

Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

Li,

Liu,

Peng

et al. 2024

Communications in Computer and Information Science

View full text Add to dashboard Cite

Fast target-aware learning for few-shot video object segmentation

Cited by 15 publications

References 44 publications

Mitigating Distractor Challenges in Video Object Segmentation through Shape and Motion Cues

Mitigating Distractor Challenges in Video Object Segmentation through Shape and Motion Cues

Spatial-Aware Transformer (SAT): Enhancing Global Modeling in Transformer Segmentation for Remote Sensing Images

Resolution-Agnostic Neural Compression for High-Fidelity Portrait Video Conferencing via Implicit Radiance Fields

Contact Info

Product

Resources

About