“…Object detection is a fundamental task in computer vision, which requires to identify object categories and use bounding boxes to locate their complete region positions. With the development of convolutional neural network (CNN) [ 1 , 2 , 3 ], some object detection methods [ 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 ], such as Fast R-CNN [ 4 ], Faster R-CNN [ 5 ], SSD [ 6 ] and YOLO [ 7 ], have made significant progress. However, these methods require fully supervised information, i.e., instance-level annotations, which are time-consuming and labor-intensive to label.…”