Abstract:In object detection, an intersection over union (IoU) threshold is required to define positives and negatives. An object detector, trained with low IoU threshold, e.g. 0.5, usually produces noisy detections. However, detection performance tends to degrade with increasing the IoU thresholds. Two main factors are responsible for this: 1) overfitting during training, due to exponentially vanishing positive samples, and 2) inference-time mismatch between the IoUs for which the detector is optimal and those of th… Show more
“…Object detection has attracted a great deal of attention in recent years [4,13,14,16,19,20,27,28,30,38,39,43,47,48,56]. One popular direction for recent object detection is proposal-based object detectors (a.k.a.…”
In this paper, we focus on semi-supervised object detection to boost accuracies of proposal-based object detectors (a.k.a. two-stage object detectors) by training on both labeled and unlabeled data. However, it is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels. To address this problem, we present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data. The approach consists of a self-supervised proposal learning module and a consistency-based proposal learning module. In the self-supervised proposal learning module, we present a proposal location loss and a contrastive loss to learn context-aware and noise-robust proposal features respectively. In the consistency-based proposal learning module, we apply consistency losses to both bounding box classification and regression predictions of proposals to learn noise-robust proposal features and predictions. Experiments are conducted on the COCO dataset with all available labeled and unlabeled data. Results show that our approach consistently improves the accuracies of fullysupervised baselines. In particular, after combining with data distillation [37], our approach improves AP by about 2.0% and 0.9% on average compared with fully-supervised baselines and data distillation baselines respectively.
“…Object detection has attracted a great deal of attention in recent years [4,13,14,16,19,20,27,28,30,38,39,43,47,48,56]. One popular direction for recent object detection is proposal-based object detectors (a.k.a.…”
In this paper, we focus on semi-supervised object detection to boost accuracies of proposal-based object detectors (a.k.a. two-stage object detectors) by training on both labeled and unlabeled data. However, it is non-trivial to train object detectors on unlabeled data due to the unavailability of ground truth labels. To address this problem, we present a proposal learning approach to learn proposal features and predictions from both labeled and unlabeled data. The approach consists of a self-supervised proposal learning module and a consistency-based proposal learning module. In the self-supervised proposal learning module, we present a proposal location loss and a contrastive loss to learn context-aware and noise-robust proposal features respectively. In the consistency-based proposal learning module, we apply consistency losses to both bounding box classification and regression predictions of proposals to learn noise-robust proposal features and predictions. Experiments are conducted on the COCO dataset with all available labeled and unlabeled data. Results show that our approach consistently improves the accuracies of fullysupervised baselines. In particular, after combining with data distillation [37], our approach improves AP by about 2.0% and 0.9% on average compared with fully-supervised baselines and data distillation baselines respectively.
“…Compared with baseline, our model outputs more accurate boxes and detects pedestrians with heavy occlusion. equation (5), the performance reaches 12.96%. Though the sign prediction loss indeed helps improving the performance, one can argue that it is because the loss involved with box prediction is increased and the sign predictor structure is not necessary.…”
Section: Methodsmentioning
confidence: 95%
“…It only selects the proper samples which fall in the desired scale range under different pyramids for training. Cascade R-CNN [5] adopts cascaded classifiers where training samples with increasingly higher overlap with ground truths are fed. Online hard example mining (OHEM) [6] dynamically chooses the samples with the highest loss in a batch to achieve better convergence and performance.…”
Training a robust classifier and an accurate box regressor are difficult for occluded pedestrian detection. Traditionally adopted Intersection over Union (IoU) measurement does not consider the occluded region of the object and leads to improper training samples. To address such issue, a modification called visible IoU is proposed in this paper to explicitly incorporate the visible ratio in selecting samples. Then a newly designed box sign predictor is placed in parallel with box regressor to separately predict the moving direction of training samples. It leads to higher localization accuracy by introducing sign prediction loss during training and sign refining in testing. Following these novelties, we obtain state-of-the-art performance on CityPersons benchmark for occluded pedestrian detection.
“…Our design strategy is to select the model of the highest accuracy from the existing state-of-the-art ones at first and then improve the efficiency of the model. Among the existing models, Cascade R-CNN [6] with ResNeXt-101 [10] backbone has the best accuracy on MS COCO dataset [11]. To further boost the performance, we add Feature Pyramid Network (FPN) [5] to the backbone of the Cascade R-CNN model so that features at different scales can be extracted better.…”
Section: A Design Of High-accuracy Modelmentioning
It is hard to detect on-road objects under various lighting conditions. To improve the quality of the classifier, three techniques are used. We define subclasses to separate daytime and nighttime samples. Then we skip similar samples in the training set to prevent overfitting. With the help of the outside training samples, the detection accuracy is also improved. To detect objects in an edge device, Nvidia Jetson TX2 platform, we exert the lightweight model ResNet-18 FPN as the backbone feature extractor. The FPN (Feature Pyramid Network) generates good features for detecting objects over various scales. With Cascade R-CNN technique, the bounding boxes are iteratively refined for better results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.