R-FCN-3000 at 30fps: Decoupling Detection and Classification

Singh, Bharat; Li, Hengduo; Sharma, Abhishek; Davis, Larry S.

doi:10.1109/cvpr.2018.00119

Cited by 97 publications

(70 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In 2016, Dai et al [11] proposed an R-FCN (Region-based Fully Convolutional Networks) to solve the problem that the ROI-wise subnetwork of Faster R-CNN did not share calculations in different region proposals. In the past two years, based on the Faster R-CNN and R-FCN, RRPN (Rotation Region Proposal Networks) [12], R-FCN-3000 [13] and other object proposal-based approaches [14] [15] of which the detection accuracy was further improved were presented. However, the frameworks of proposal-based approaches that had two stages, the region proposal generation and the subsequent feature resampling, were much more complex in comparison with the regression-based approaches; which resulted in low speed and difficulty in real-time performance.…”

mentioning

confidence: 99%

DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection

Huang

Wang

et al. 2020

Information Sciences

273

View full text Add to dashboard Cite

Abstract：Although YOLOv2 approach is extremely fast on object detection; its backbone network has the low ability on feature extraction and fails to make full use of multi-scale local region features, which restricts the improvement of object detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating the object detection accuracy of YOLOv2. Specifically, the dense connection of convolution layers is employed in the backbone network of YOLOv2 to strengthen the feature extraction and alleviate the vanishing-gradient problem. Moreover, an improved spatial pyramid pooling is introduced to pool and concatenate the multi-scale local region features, so that the network can learn the object features more comprehensively. The DC-SPP-YOLO model is established and trained based on a new loss function composed of mean square error and cross entropy, and the object detection is realized. Experiments demonstrate that the mAP (mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and using the multi-scale local region features.

show abstract

mentioning

confidence: 99%

DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection

Huang

Wang

et al. 2020

Information Sciences

273

View full text Add to dashboard Cite

show abstract

“…Training Data/Label mAP-CG mAP-FG SNIPER [28] CG-Fully 54.0 -SNIPER [28] 3k-FG-Fully -41.6 YOLO-9000 * [24] COCO+9k-FG-Weakly 19.9 -R-FCN-3000 * [27] 3k-FG-Fully We then run experiments on ImageNet dataset. As shown in Table 3, we use the ILSVRC 2014 Detection set with 200 classes as the coarse-grained set.…”

Section: Methodsmentioning

confidence: 99%

Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes

Yang

Chen

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Recent advances in deep learning greatly boost the performance of object detection. State-of-the-art methods such as Faster-RCNN, FPN and R-FCN have achieved high accuracy in challenging benchmark datasets. However, these methods require fully annotated object bounding boxes for training, which are incredibly hard to scale up due to the high annotation cost. Weakly-supervised methods, on the other hand, only require image-level labels for training, but the performance is far below their fully-supervised counterparts. In this paper, we propose a semi-supervised large scale fine-grained detection method, which only needs bounding box annotations of a smaller number of coarsegrained classes and image-level labels of large scale finegrained classes, and can detect all classes at nearly fullysupervised accuracy. We achieve this by utilizing the correlations between coarse-grained and fine-grained classes with shared backbone, soft-attention based proposal reranking, and a dual-level memory module. Experiment results show that our methods can achieve close accuracy on object detection to state-of-the-art fully-supervised methods on two large scale datasets, ImageNet and OpenImages, with only a small fraction of fully annotated classes.

show abstract

“…In [41], a weakly-supervised object detector is trained on a weakly-labeled web dataset to generate pseudo ground-truths for the target detection task. [37] combines region-level semantic similarity and common-sense information learned from some external knowledge bases to train the detector with just image-level labels.…”

Section: Related Workmentioning

confidence: 99%

“…For example, YOLO-9000 [33] extend the detector's class coverage by concurrently training on bounding box-level data and image-level data, such that the image-level data contribute only to classification loss. By decoupling the detection network into two branches (positive-sensitive & semanticfocused), R-FCN-3K [37] is able to scale detection up to 3000 classes despite being trained on limited bounding box annotations for several object classes. In contrast to these, we focus on large-scale object detection without having access to additional data (classification) sources during the training.…”

Section: Related Workmentioning

confidence: 99%

Scaling Object Detection by Transferring Classification Weights

Kuen

Perazzi

Lin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Large scale object detection datasets are constantly increasing their size in terms of the number of classes and annotations count. Yet, the number of object-level categories annotated in detection datasets is an order of magnitude smaller than image-level classification labels. State-of-the art object detection models are trained in a supervised fashion and this limits the number of object classes they can detect. In this paper, we propose a novel weight transfer network (WTN) to effectively and efficiently transfer knowledge from classification network's weights to detection network's weights to allow detection of novel classes without box supervision. We first introduce input and feature normalization schemes to curb the under-fitting during training of a vanilla WTN. We then propose autoencoder-WTN (AE-WTN) which uses reconstruction loss to preserve classification network's information over all classes in the target latent space to ensure generalization to novel classes. Compared to vanilla WTN, AE-WTN obtains absolute performance gains of 6% on two Open Images evaluation sets with 500 seen and 57 novel classes respectively, and 25% on a Visual Genome evaluation set with 200 novel classes.

show abstract

R-FCN-3000 at 30fps: Decoupling Detection and Classification

Cited by 97 publications

References 45 publications

DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection

DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection

Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes

Scaling Object Detection by Transferring Classification Weights

Contact Info

Product

Resources

About