“…Deep-learning-based methods, such as SSD [19], Faster R-CNN [23], YOLO [22], U-Net [24], and so on, achieved excellent results in visible light object detection and segmentation. In order to deal with ISTD, many technologies has been proposed, including model pruning [36,8,30], multi-scale fusion [34,10,16], multi-modal fusion [25,29,26], feature pyramid [27,35,15], etc. ACM [5] and ALC-Net [6] utilized a top-down global attention module and a bottom-up local attention module to separately transfer semantic information and context information, and to prevent the disappearance of small targets.…”