“…In particular, the observation model aims to reflect the similarity between the initially specified object of interest and the tracked object through the video sequence. The similarity could be modeled either by a color similarity based distance metric [9] or in terms of a reconstruction error minimized by using a sparse representation of the target object [17,18,19,20]. As a discriminative detector the statistical classifiers including support vector machines (SVM), online or offline learning schemes and CNN based detectors are widely used [11,16,21,22,23,24,25,26].…”