Learning to Track at 100 FPS with Deep Regression Networks

Held, David; Thrun, Sebastian; Savarese, Silvio

doi:10.1007/978-3-319-46448-0_45

Cited by 973 publications

(806 citation statements)

References 40 publications

Supporting

Mentioning

786

Contrasting

Order By: Relevance

“…The common approach is to specify a target by means of a bounding box around the object and to track this target as it moves throughout the video [38,33,20]. The paradigm has proven to be effective and considerable progress has been achieved [17,37,34,3,11]. Yet, the fundamental assumption of having a bounding box target specification available has never been challenged.…”

Section: Introductionmentioning

confidence: 99%

Tracking by Natural Language Specification

Tao

Gavves

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

114

176

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Tracking by Natural Language Specification

Tao

Gavves

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

114

176

View full text Add to dashboard Cite

show abstract

“…In order to describe the target better, we applied convolutional networks to learn robust representations for visual tracking without offline training using a large amount of auxiliary data, which is inspired by recent studies [11,18]. First, we use predefined convolutional filters to extract the high-order features.…”

Section: Convolutional Network Modelmentioning

confidence: 99%

Target Tracking via Particle Filter and Convolutional Network

Chu

Wang

Xing

2018

Journal of Electrical and Computer Engineering

View full text Add to dashboard Cite

We propose a more effective tracking algorithm which can work robustly in a complex scene such as illumination, appearance change, and partial occlusion. The algorithm is based on an improved particle filter which used the efficient design of observation model. Predefined convolutional filters are used to extract the high-order features. The global representation is generated by combining local features without changing their structures and space arrangements. It not only increases the feature invariance, but also maintains the specificity. The extracted feature from convolution network is introduced into particle filter algorithm. The observation model is constructed by fusing the color feature of the target and a set of features from templates which are extracted by convolutional networks without training in our paper. It is fused with the features extracted from convolutional network for tracking. In the process of tracking, the template is updated in real time, and then the robustness of the algorithm is improved. Experiments show that the algorithm can achieve an ideal tracking effect when the targets are in a complex environment.

show abstract

“…In (Hong et al, 2015a), sampled feature maps are classified by SVM to generate a saliency map. More recently, a number of studies use Recurrent Neural Networks (RNNs) for visual tracking (Bertinetto et al, 2016, Held et al, 2016, Chen and Tao, 2016. In (Held et al, 2016), Held et.…”

Section: Related Workmentioning

confidence: 99%

“…In the first image, the target is cropped as target template with some background texture. Using the sample generation idea from (Held et al, 2016), we randomly shift and scale the target in the second image as search image to simulate the motion of the target and camera simultaneously. In tracking, we generally cropped a search image in the new frame based on the target's previous location instead to track the target in the whole image.…”

Section: Heat Map For Target Localization Predictionmentioning

confidence: 99%

“…where u = 0, for both scale and translation changes for scale changes, b = bs = 1/5 for translation change, b = bt = 1/15, Also, we enforce the scale change to be less than ±0.4 and the center of the translated target is still in the search image similar to the work in (Held et al, 2016). The ImageNet dataset is mainly used to teach localization network to find object boundary and the smooth motion as complementary data in the limited ALOV300+ dataset.…”

Section: The Localization Networkmentioning

confidence: 99%

See 1 more Smart Citation

Visual Tracking Utilizing Object Concept From Deep Learning Network

Xiao

Yilmaz

Lia

2017

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

View full text Add to dashboard Cite

ABSTRACT:Despite having achieved good performance, visual tracking is still an open area of research, especially when target undergoes serious appearance changes which are not included in the model. So, in this paper, we replace the appearance model by a concept model which is learned from large-scale datasets using a deep learning network. The concept model is a combination of high-level semantic information that is learned from myriads of objects with various appearances. In our tracking method, we generate the target's concept by combining the learned object concepts from classification task. We also demonstrate that the last convolutional feature map can be used to generate a heat map to highlight the possible location of the given target in new frames. Finally, in the proposed tracking framework, we utilize the target image, the search image cropped from the new frame and their heat maps as input into a localization network to find the final target position. Compared to the other state-of-the-art trackers, the proposed method shows the comparable and at times better performance in real-time.

show abstract

Learning to Track at 100 FPS with Deep Regression Networks

Cited by 973 publications

References 40 publications

Tracking by Natural Language Specification

Tracking by Natural Language Specification

Target Tracking via Particle Filter and Convolutional Network

Visual Tracking Utilizing Object Concept From Deep Learning Network

Contact Info

Product

Resources

About