Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

He, Kai; Zhang, Xiangyu; Ren, Shouxin; Sun, Jian

doi:10.1109/tpami.2015.2389824

Cited by 8,778 publications

(4,569 citation statements)

References 21 publications

Supporting

Mentioning

4,490

Contrasting

Unclassified

Order By: Relevance

“…These algorithms are mainly classified into two groups: one is object detection method based on region proposal [58][59][60], which is a mainstream algorithm, e.g., RCNN [31], SPPNet [61], Fast-RCNN [34], Faster-RCNN [62], and MSRA recently proposes algorithm R-FCN [63]. The other is not using the region proposal method to detection, e.g., YOLO [64] and SSD [65].…”

Section: Related Workmentioning

confidence: 99%

“…Due to the fact that there is a large amount of overlap between these RoIs, redundant calculations result in inefficiencies. SSP-Net [61] and Fast-RCNN [34] propose a shared feature method that is extracted only one time for the whole image for this problem. And then, about 2000 RoIs are mapped according to their location information to the feature vector of the whole image to obtain the features of each RoI, so it greatly improves the speed of calculation because the feature extraction calculations of different RoI can be shared.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Aircraft detection in remote sensing images based on saliency and convolution neural network

Yang

Han

et al. 2018

J Wireless Com Network

View full text Add to dashboard Cite

New algorithms and architectures for the current industrial wireless sensor networks shall be explored to ensure the efficiency, robustness, and consistence in variable application environments which concern different issues, such as the smart grid, water supply, and gas monitoring. Object detection automatic in remote sensing images has always been a hot topic. Using the conventional deep convolution network based on region proposal for detection, there are many negative samples in the generated region proposal, which will affect the model detection precision and efficiency. Saliency uses the human visual attention mechanism to achieve the bottom-up object detection. Since replacing the selective search with saliency can greatly reduce the number of proposal areas, we will get some region of interests (RoIs) and their position information by using the saliency algorithm based on the background priori for the remote sensing image. And then, the position information is mapped to the feature vector of the whole image obtained by deep convolution neural network. Finally, the each RoI will be classified and fine-tuned bounding box. In this paper, our model is compared with Fast-RCNN that is the current state-of-the-art detection model. The mAP of our model reaches 99%, which is 12.4% higher than that of Fast-RCNN. In addition, we also study the effect of different iterations on model and find the model of 10,000 iterations already has a higher accuracy. Finally, we compare the results of different number of negative samples and find the detection accuracy is highest when the number of negative samples reaches 400.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Aircraft detection in remote sensing images based on saliency and convolution neural network

Yang

Han

et al. 2018

J Wireless Com Network

View full text Add to dashboard Cite

show abstract

“…However, because it performs a ConvNet for each object proposal, the time spent on computing region proposals and features (13s/image on a Graphics Processing Unit (GPU) or 53s/image on a CPU) cannot be ignored for an object detection system. Inspired by the Spatial pyramid pooling networks (SPPnets) [40], Girshick [34] proposed Fast R-CNN to speed up R-CNN by sharing computation. The network processed all the images with a CNN to produce a conv feature map.…”

Section: Deep Learning In Computer Visionmentioning

confidence: 99%

Ear Detection under Uncontrolled Conditions with Multiple Scale Faster Region-Based Convolutional Neural Networks

Zhang

2017

Symmetry

View full text Add to dashboard Cite

Abstract:Ear detection is an important step in ear recognition approaches. Most existing ear detection techniques are based on manually designing features or shallow learning algorithms. However, researchers found that the pose variation, occlusion, and imaging conditions provide a great challenge to the traditional ear detection methods under uncontrolled conditions. This paper proposes an efficient technique involving Multiple Scale Faster Region-based Convolutional Neural Networks (Faster R-CNN) to detect ears from 2D profile images in natural images automatically. Firstly, three regions of different scales are detected to infer the information about the ear location context within the image. Then an ear region filtering approach is proposed to extract the correct ear region and eliminate the false positives automatically. In an experiment with a test set of 200 web images (with variable photographic conditions), 98% of ears were accurately detected. Experiments were likewise conducted on the Collection J2 of University of Notre Dame Biometrics Database (UND-J2) and University of Beira Interior Ear dataset (UBEAR), which contain large occlusion, scale, and pose variations. Detection rates of 100% and 98.22%, respectively, demonstrate the effectiveness of the proposed approach.

show abstract

“…High spatial resolution (HSR) remote sensing imaging sensors can now acquire aerial and satellite images with abundant detail and complex spatial structural information, which can be used in a wide range of civil and engineering applications, such as segmentation [4], scene annotation [5], object detection [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] (e.g., airplane detection [6,12], urban area detection [13], vehicle detection [21,22]), scene classification and recognition [23][24][25][26][27], etc. Differing from natural imagery obtained by the camera on the ground from a horizontal view, HSR remote sensing imagery is obtained by satellite-borne or space-borne sensors from a top-down view, which is an approach that can be easily influenced by weather and illumination conditions.…”

Section: Introductionmentioning

confidence: 99%

An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery

2017

View full text Add to dashboard Cite

Geospatial object detection from high spatial resolution (HSR) remote sensing imagery is a significant and challenging problem when further analyzing object-related information for civil and engineering applications. However, the computational efficiency and the separate region generation and localization steps are two big obstacles for the performance improvement of the traditional convolutional neural network (CNN)-based object detection methods. Although recent object detection methods based on CNN can extract features automatically, these methods still separate the feature extraction and detection stages, resulting in high time consumption and low efficiency. As a significant influencing factor, the acquisition of a large quantity of manually annotated samples for HSR remote sensing imagery objects requires expert experience, which is expensive and unreliable. Despite the progress made in natural image object detection fields, the complex object distribution makes it difficult to directly deal with the HSR remote sensing imagery object detection task. To solve the above problems, a highly efficient and robust integrated geospatial object detection framework based on faster region-based convolutional neural network (Faster R-CNN) is proposed in this paper. The proposed method realizes the integrated procedure by sharing features between the region proposal generation stage and the object detection stage. In addition, a pre-training mechanism is utilized to improve the efficiency of the multi-class geospatial object detection by transfer learning from the natural imagery domain to the HSR remote sensing imagery domain. Extensive experiments and comprehensive evaluations on a publicly available 10-class object detection dataset were conducted to evaluate the proposed method.

show abstract

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Cited by 8,778 publications

References 21 publications

Aircraft detection in remote sensing images based on saliency and convolution neural network

Aircraft detection in remote sensing images based on saliency and convolution neural network

Ear Detection under Uncontrolled Conditions with Multiple Scale Faster Region-Based Convolutional Neural Networks

An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery

Contact Info

Product

Resources

About