“…• Robust target representation: Providing a powerful target representation is the main advantage of employing CNNs for visual tracking. To achieve the goal of learning generic representations for target modeling and constructing a more robust target models, the main contributions of methods are classified into: i) offline training of CNNs on large-scale datasets for visual tracking [63], [68], [80], [89], [97], [100], [101], [104], [112], [116], [135], [137], [142], [144], [153], [165], [168], [169], [173], ii) designing specific deep convolutional networks instead of employing pre-trained models [63], [68], [70], [72], [73], [75], [76], [80], [82], [89], [97], [100], [101], [104], [105], [108], [112], [116], [127], [135], [137], [141], [142], [144], [146], [150],…”