See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks

Lu, Xiankai; Yang, Yi; Ma, Chao; Shen, Jianbing; Shao, Ling; Porikli, Fatih

doi:10.1109/cvpr.2019.00374

Cited by 456 publications

(244 citation statements)

References 62 publications

Supporting

Mentioning

226

Contrasting

Unclassified

Order By: Relevance

“…The so-called attention is a way to observe the world and imitate the mechanism of humans [35]. Recently, it has been demonstrated to be a simple but effective tool for improving the representation ability of CNN through reweighting of the feature maps; this is achieved using spatial attention and channel attention to scale the features which are meaningful or useless [36][37][38][39][40][41][42][43].…”

Section: Attention Mechanismmentioning

confidence: 99%

“…More recently, in order to understand the fine-grained relationship and mine the underlying correlations between different modalities, co-attention mechanisms have been widely studied in vision-and-language tasks, such as in visual question answering (VQA) [46][47][48][49]. In the computer vision area, Lu was inspired by the above-mentioned works and built the co-attention module to capture the coherence between video frames and effectively suppress the current alternatives [40].…”

Section: Attention Mechanismmentioning

confidence: 99%

See 1 more Smart Citation

PGA-SiamNet: Pyramid Feature-Based Attention-Guided Siamese Network for Remote Sensing Orthoimagery Building Change Detection

Jiang

et al. 2020

Remote Sensing

182

View full text Add to dashboard Cite

In recent years, building change detection has made remarkable progress through using deep learning. The core problems of this technique are the need for additional data (e.g., Lidar or semantic labels) and the difficulty in extracting sufficient features. In this paper, we propose an end-to-end network, called the pyramid feature-based attention-guided Siamese network (PGA-SiamNet), to solve these problems. The network is trained to capture possible changes using a convolutional neural network in a pyramid. It emphasizes the importance of correlation among the input feature pairs by introducing a global co-attention mechanism. Furthermore, we effectively improved the long-range dependencies of the features by utilizing various attention mechanisms and then aggregating the features of the low-level and co-attention level; this helps to obtain richer object information. Finally, we evaluated our method with a publicly available dataset (WHU) building dataset and a new dataset (EV-CD) building dataset. The experiments demonstrate that the proposed method is effective for building change detection and outperforms the existing state-of-the-art methods on high-resolution remote sensing orthoimages in various metrics.

show abstract

Section: Attention Mechanismmentioning

confidence: 99%

Section: Attention Mechanismmentioning

confidence: 99%

PGA-SiamNet: Pyramid Feature-Based Attention-Guided Siamese Network for Remote Sensing Orthoimagery Building Change Detection

Jiang

et al. 2020

Remote Sensing

182

View full text Add to dashboard Cite

show abstract

“…In the area of segmentation, semantic segmentation and panoptic segmentation [43][44][45][46] use the attention mechanism to guide the feed-forward network for segmenting more accurately. Especially, the attention mechanism in video object segmentation helps to focus on target objects and overlook confusing background [41,47,48]. As for the attention mechanism itself, there exist different variants: hierarchical attention [49], self-attention [50], and coattention [51].…”

Section: Attention Mechanismmentioning

confidence: 99%

Symmetry Encoder-Decoder Network with Attention Mechanism for Fast Video Object Segmentation

et al. 2019

View full text Add to dashboard Cite

Semi-supervised video object segmentation (VOS) has obtained significant progress in recent years. The general purpose of VOS methods is to segment objects in video sequences provided with a single annotation in the first frame. However, many of the recent successful methods heavily fine-tune the object mask in the first frame, which decreases their efficiency. In this work, to address this issue, we propose a symmetry encoder-decoder network with the attention mechanism for video object segmentation (SAVOS) requiring only one forward pass to segment the target object in a video. Specifically, the encoder generates a low-resolution mask with smoothed boundaries, while the decoder further refines the details of the segmentation mask and integrates lower level features progressively. Besides, to obtain accurate segmentation results, we sequentially apply the attention module on multi-scale feature maps for refinement. We conduct several experiments on three challenging datasets (i.e., DAVIS 2016, DAVIS 2017, and SegTrack v2) to show that SAVOS achieves competitive performance against the state-of-the-art.

show abstract

“…tasks [20], [21], [22], [23], including visual tracking [24], [25], have been shown to benefit from powerful deep discriminative features [26], [27], [28], [29]. The top-ranked trackers in recent competitions, e.g., OTB100 [8], VOT2017 [30], VOT2018 [31], are all based on deep neural network features.…”

Section: Introductionmentioning

confidence: 99%

Learning Low-Rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking

Feng

et al. 2020

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Discriminative correlation filter (DCF) has achieved advanced performance in visual object tracking with remarkable efficiency guaranteed by its implementation in the frequency domain. However, the effect of the structural relationship of DCF and object features has not been adequately explored in the context of the filter design. To remedy this deficiency, this paper proposes a Low-rank and Sparse DCF (LSDCF) that improves the relevance of features used by discriminative filters. To be more specific, we extend the classical DCF paradigm from ridge regression to lasso regression, and constrain the estimate to be of low-rank across frames, thus identifying and retaining the informative filters distributed on a low-dimensional manifold. To this end, specific temporal-spatial-channel configurations are adaptively learned to achieve enhanced discrimination and interpretability. In addition, we analyse the complementary characteristics between hand-crafted features and deep features, and propose a coarse-to-fine heuristic tracking strategy to further improve the performance of our LSDCF. Last, the augmented Lagrange multiplier optimisation method is used to achieve efficient optimisation. The experimental results obtained on a number of well-known benchmarking datasets, including OTB2013, OTB50, OTB100, TC128, UAV123, VOT2016 and VOT2018, demonstrate the effectiveness and robustness of the proposed method, delivering outstanding performance compared to the state-of-the-art trackers.

show abstract

See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks

Cited by 456 publications

References 62 publications

PGA-SiamNet: Pyramid Feature-Based Attention-Guided Siamese Network for Remote Sensing Orthoimagery Building Change Detection

PGA-SiamNet: Pyramid Feature-Based Attention-Guided Siamese Network for Remote Sensing Orthoimagery Building Change Detection

Symmetry Encoder-Decoder Network with Attention Mechanism for Fast Video Object Segmentation

Learning Low-Rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking

Contact Info

Product

Resources

About