Detecting vehicles in aerial imagery plays an important role in a wide range of applications. The current vehicle detection methods are mostly based on sliding-window search and handcrafted or shallow-learning-based features, having limited description capability and heavy computational costs. Recently, due to the powerful feature representations, region convolutional neural networks (CNN) based detection methods have achieved state-of-the-art performance in computer vision, especially Faster R-CNN. However, directly using it for vehicle detection in aerial images has many limitations: (1) region proposal network (RPN) in Faster R-CNN has poor performance for accurately locating small-sized vehicles, due to the relatively coarse feature maps; and (2) the classifier after RPN cannot distinguish vehicles and complex backgrounds well. In this study, an improved detection method based on Faster R-CNN is proposed in order to accomplish the two challenges mentioned above. Firstly, to improve the recall, we employ a hyper region proposal network (HRPN) to extract vehicle-like targets with a combination of hierarchical feature maps. Then, we replace the classifier after RPN by a cascade of boosted classifiers to verify the candidate regions, aiming at reducing false detection by negative example mining. We evaluate our method on the Munich vehicle dataset and the collected vehicle dataset, with improvements in accuracy and robustness compared to existing methods.
Recent years have witnessed an ever-mounting interest in the research of sparse representation. The framework, Sparse Representation-based Classification (SRC), has been widely applied as a classifier in numerous domains, among which Synthetic Aperture Radar (SAR) target recognition is really challenging because it still is an open problem to interpreting the SAR image. In this paper, SRC is utilized to classify a 10-class moving and stationary target acquisition and recognition (MSTAR) target, which is a standard SAR data set. Before the classification, the sizes of the images need to be normalized to maintain the useful information, target and shadow, and to suppress the speckle noise. Specifically, a preprocessing method is recommended to extract the feature vectors of the image, and the feature vectors of the test samples can be represented by the sparse linear combination of basis vectors generated by the feature vectors of the training samples. Then the sparse representation is solved by l 1 -norm minimization. Finally, the identities of the test samples are inferred by the reconstructive errors calculated through the sparse coefficient. Experimental results demonstrate the good performance of SRC. Additionally, the average recognition rate under different feature spaces and the recognition rate of each target are discussed.
Feature extraction is a crucial step for any automatic target recognition process, especially in the interpretation of synthetic aperture radar (SAR) imagery. In order to obtain distinctive features, this paper proposes a feature fusion algorithm for SAR target recognition based on a stacked autoencoder (SAE). The detailed procedure presented in this paper can be summarized as follows: firstly, 23 baseline features and Three-Patch Local Binary Pattern (TPLBP) features are extracted. These features can describe the global and local aspects of the image with less redundancy and more complementarity, providing richer information for feature fusion. Secondly, an effective feature fusion network is designed. Baseline and TPLBP features are cascaded and fed into a SAE. Then, with an unsupervised learning algorithm, the SAE is pre-trained by greedy layer-wise training method. Capable of feature expression, SAE makes the fused features more distinguishable. Finally, the model is fine-tuned by a softmax classifier and applied to the classification of targets. 10-class SAR targets based on Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset got a classification accuracy up to 95.43%, which verifies the effectiveness of the presented algorithm.
Vehicle detection with orientation estimation in aerial images has received widespread interest as it is important for intelligent traffic management. This is a challenging task, not only because of the complex background and relatively small size of the target, but also the various orientations of vehicles in aerial images captured from the top view. The existing methods for oriented vehicle detection need several post-processing steps to generate final detection results with orientation, which are not efficient enough. Moreover, they can only get discrete orientation information for each target. In this paper, we present an end-to-end single convolutional neural network to generate arbitrarily-oriented detection results directly. Our approach, named Oriented_SSD (Single Shot MultiBox Detector, SSD), uses a set of default boxes with various scales on each feature map location to produce detection bounding boxes. Meanwhile, offsets are predicted for each default box to better match the object shape, which contain the angle parameter for oriented bounding boxes' generation. Evaluation results on the public DLR Vehicle Aerial dataset and Vehicle Detection in Aerial Imagery (VEDAI) dataset demonstrate that our method can detect both the location and orientation of the vehicle with high accuracy and fast speed. For test images in the DLR Vehicle Aerial dataset with a size of 5616 × 3744, our method achieves 76.1% average precision (AP) and 78.7% correct direction classification at 5.17 s on an NVIDIA GTX-1060.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.