2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.279
|View full text |Cite
|
Sign up to set email alerts
|

CREST: Convolutional Residual Learning for Visual Tracking

Abstract: Discriminative correlation filters (DCFs) have been shown to perform superiorly in visual tracking. They only need a small set of training samples from the initial frame to generate an appearance model. However, existing DCFs learn the filters separately from feature extraction, and update these filters using a moving average operation with an empirical weight. These DCF trackers hardly benefit from the end-to-end training. In this paper, we propose the CREST algorithm to reformulate DCFs as a one-layer convol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
381
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 523 publications
(396 citation statements)
references
References 45 publications
(124 reference statements)
0
381
0
1
Order By: Relevance
“…Additionally, spatial-temporal context [86] and kernel tricks [27] are used to improve the learning formulation with the consideration of local appearance and nonlinear metric, respectively. The DCF paradigm has further been extended by exploiting scale detection [41,14,16], structural patch analysis [42,46,45], multi-clue fusion [71,50,28,4,72], sparse representation [88,90], support vector machine [75,92], enhanced sampling mechanisms [89,54] and end-to-end deep neural networks [73,67].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Additionally, spatial-temporal context [86] and kernel tricks [27] are used to improve the learning formulation with the consideration of local appearance and nonlinear metric, respectively. The DCF paradigm has further been extended by exploiting scale detection [41,14,16], structural patch analysis [42,46,45], multi-clue fusion [71,50,28,4,72], sparse representation [88,90], support vector machine [75,92], enhanced sampling mechanisms [89,54] and end-to-end deep neural networks [73,67].…”
Section: Related Workmentioning
confidence: 99%
“…We evaluated the proposed method on several wellknown benchmarks, including OTB2013/OTB2015 [81,82], VOT2017/VOT2018 [33,34] and TrackingNet Test dataset [55], and compared it with a number of state-of-theart trackers, such as VITAL [68], MetaT [58], ECO [13], MCPF [89], CREST [67], BACF [31], CFNet [73], CACF [54], ACFN [11], CSRDCF [49], C-COT [51], Staple [4], SiamFC [5], SRDCF [15], KCF [27], SAMF [41], DSST [16] and other advanced trackers in VOT challenges, i.e., CFCF [23], CFWCR [25], LSART [69], UPDT [6], SiamRPN [91], MFT [34] and LADCF [83].…”
Section: Implementation and Evaluation Settingsmentioning
confidence: 99%
“…Wang et al [36] train two separate convolutional layers to regress Gaussian maps with the initial frame and update these layers every few frames. Similarly, Song et al [32] also utilize a number of gradient descent iterations in initialization and online update procedures. These trackers need many training iterations to capture the appearance variations of the target, which makes the tracker less effective and far from real-time requirements.…”
Section: Model Updating In Trackingmentioning
confidence: 99%
“…There are two groups of deep-learning-based trackers. The first group [36,28,32,4] improves the discriminative ability of deep networks by frequent online update. They utilize the first frame to initialize the model and update it * Corresponding Author: Dr. Dong Wang Figure 1.…”
Section: Introductionmentioning
confidence: 99%
“…The speed of a tracking algorithm is measured in Frames Per Second (FPS). We compare our ACFT with a number of state-ofthe-art DCF trackers, including MetaT [48] (ECCV18), MCPF [49] (CVPR17), CREST [50] (ICCV17), BACF [30] (ICCV17), CFNet [51] (CVPR17), STA-PLE_CA [52] (CVPR17), ACFN [53] (CVPR17), CSRDCF [31] (CVPR17), C-COT [45] (ECCV16), Staple [27] (CVPR16), SRDCF [11] (ICCV15), KCF [43] (TPAMI15), SAMF [54] (ECCVW14) and DSST [55] (TPAMI17). Location error threshold The VOT2017 benchmark consists of 60 challenging video sequences.…”
Section: Datasets and Evaluation Metricsmentioning
confidence: 99%