“…In the implementation process, we use the Darknet-53 pre-trained on ImageNet [47] as the feature extractor of YOLOv3-based detector and then fine-tune the ps2.0 dataset [11]. In the process of fine-tuning, the batch size is 32, the image is scaled to 416 × 416, the anchors are modified for ps2.0 dataset to [ (10,13), (28,42), (33,23), (30,61), (62, 45), (61, 199), (126, 87), (156,198)], and the learning rate starts from 0.0001 and is decayed by 10 every 45,000 steps. The Adam optimizer is used with the proposed optimization setting in [48] with [β 1 , β 2 , ε] = [0.9, 0.999, 10 −8 ].…”