“…The two auxiliary streams used the visible and infrared images to extract the features of specific modalities, respectively. Different from the former two, the enhanced background perception correlation filtering method [29] adopted the strategy of first fusion and then tracking, which converted the infrared image into a single-channel image, and used the grayscale information to determine the pixels between the target and the overall environment. The degree of difference enabled target tracking through adaptive weighted decisions on visible and infrared images.…”