“…Extracting a robust target representation is the critical component of state-of-the-art visual tracking methods to overcome these challenges. Hence, to robustly model target appearance, these methods utilize a wide range of handcrafted features (e.g., [41,7,87,55,92,68] which exploit histogram of oriented gradients (HOG) [11], histogram of local intensities (HOI), and Color Names (CN) [81]), deep features from deep neural networks (e.g., [61,63,85,2,79,40], or both (e.g., [14,16,64]).…”