Learning Discriminative Model Prediction for Tracking

Bhat, Goutam; Danelljan, Martin; Gool, Luc Van; Timofte, Radu

doi:10.1109/iccv.2019.00628

Cited by 980 publications

(1,053 citation statements)

References 39 publications

Supporting

Mentioning

1,035

Contrasting

Unclassified

Order By: Relevance

“…With a total of 123 videos, the size of the dataset is approximately 13.5 G. Figure 12 compares our tracker with four state-of-the-art trackers in terms of success rate and speed. The success rate of our tracker is slightly lower than that of DiMP50 [ 17 ], yet its speed is higher. Moreover, although DiMP18 is faster than our tracker, its success rate is lower.…”

Section: Experiments and Discussionmentioning

confidence: 86%

“…The networks of such approaches require constant fine-tuning, preventing real-time tracking requirements to be met. From the perspective of methods, in addition to the Siamese network-based methods that have dominated in recent years (e.g., [ 12 ]), a research branch began to focus on small sample learning target tracking methods represented by Meta Learning (e.g., [ 13 , 14 ]), with another research branch always insisting on the use of correlation filter approaches (e.g., [ 15 , 16 , 17 , 18 ]). Ocean [ 12 ] represents the trackers based on a Siamese network evolved from Anchor-Based to Anchor-Free.…”

Section: Introductionmentioning

confidence: 99%

“…Although Meta Learning is of great research value in target tracking tasks, current methods based on Meta Learning have poor performance when the background is complex. DiMP [ 17 ] introduced Meta Learning after ATOM [ 16 ] to update the template. PrDiMP [ 18 ] proposed a probabilistic regression formulation and applied it to tracking.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Visual Tracker Offering More Solutions

Zhao

Mahmoud

Ren

et al. 2020

Sensors

View full text Add to dashboard Cite

Most trackers focus solely on robustness and accuracy. Visual tracking, however, is a long-term problem with a high time limitation. A tracker that is robust, accurate, with long-term sustainability and real-time processing, is of high research value and practical significance. In this paper, we comprehensively consider these requirements in order to propose a new, state-of-the-art tracker with an excellent performance. EfficientNet-B0 is adopted for the first time via neural architecture search technology as the backbone network for the tracking task. This improves the network feature extraction ability and significantly reduces the number of parameters required for the tracker backbone network. In addition, maximal Distance Intersection-over-Union is set as the target estimation method, enhancing network stability and increasing the offline training convergence rate. Channel and spatial dual attention mechanisms are employed in the target classification module to improve the discrimination of the trackers. Furthermore, the conjugate gradient optimization strategy increases the speed of the online learning target classification module. A two-stage search method combined with a screening module is proposed to enable the tracker to cope with sudden target movement and reappearance following a brief disappearance. Our proposed method has an obvious speed advantage compared with pure global searching and achieves an optimal performance on OTB2015, VOT2016, VOT2018-LT, UAV-123 and LaSOT while running at over 50 FPS.

show abstract

Section: Experiments and Discussionmentioning

confidence: 86%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Visual Tracker Offering More Solutions

Zhao

Mahmoud

Ren

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…They implement fine-tuning the backbone for the end-toend training. After an analysis on the impact of different feature blocks in DiMP [5], they use the features from block3 and block4 for IoU-Net, and only from block4 for the classifier. The feature extractor F is shared and only performed on a single image patch per frame.…”

Section: Baseline Rgb Trackermentioning

confidence: 99%

“…Finally, we can see how fine-tuning only on RGB improves the performance of the pre-trained model, but to a lesser extent than using TIR. In the lower part of Table 1 we analyze the effectiveness of each fusion mechanism for DiMP [5], which we discuss in detail in the remainder of this section. Pixel-level fusion.…”

Section: Implementation Detailsmentioning

confidence: 99%

Multi-Modal Fusion for End-to-End RGB-T Tracking

Zhang

Danelljan

González-García

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

Self Cite

110

View full text Add to dashboard Cite

We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a largescale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.

show abstract

A Multi‐Template Fusion Object Tracking Algorithm Based on Graph Attention Network

Wang

et al. 2022

IEEJ Transactions Elec Engng

View full text Add to dashboard Cite

In recent years, the object-tracking algorithm based on Siamese network has gradually become the mainstream algorithm in the field of object tracking due to its characteristics of balancing speed and accuracy. The majority of Siamese-based trackers only use the first frame extraction template for subsequent tracking in order to prevent the introduction of noise. However, merely with a single initial template employed, it is difficult to achieve the best performance of the tracker in the face of complex tracking environments such as occlusion, motion blur, and non-rigid deformation. Therefore, the present paper proposes a new multi-template fusion module based on graph attention network (G-M module), which consists of two parts: a graph-attentionnetwork-based feature-embedding module (G module) and a multi-template fusion module (M module). It can greatly reduce the background noise introduced by template updating while improving the tracker's ability to adapt to changes in object appearance. In addition, in order to maximize the value of G-M module, the present paper also puts forward a two-stage template update threshold judgment mechanism. The Pearson correlation coefficient (PCCs) is introduced and combined with APCE and the maximum response value (F-max) to filter out reliable templates for updating. In this paper, the proposed method is applied to the SiamFC and SiamFC++ trackers. Extensive experiments on mainstream data sets, such as OTB2015, VOT2016, and GOT-10 k, show that the proposed method can effectively update the tracking template and improve the tracker performance.

show abstract

Learning Discriminative Model Prediction for Tracking

Cited by 980 publications

References 39 publications

A Visual Tracker Offering More Solutions

A Visual Tracker Offering More Solutions

Multi-Modal Fusion for End-to-End RGB-T Tracking

A Multi‐Template Fusion Object Tracking Algorithm Based on Graph Attention Network

Contact Info

Product

Resources

About