Vehicle detection in aerial images is an important and challenging task. Traditionally, many target detection models based on sliding-window fashion were developed and achieved acceptable performance, but these models are time-consuming in the detection phase. Recently, with the great success of convolutional neural networks (CNNs) in computer vision, many state-of-the-art detectors have been designed based on deep CNNs. However, these CNN-based detectors are inefficient when applied in aerial image data due to the fact that the existing CNN-based models struggle with small-size object detection and precise localization. To improve the detection accuracy without decreasing speed, we propose a CNN-based detection model combining two independent convolutional neural networks, where the first network is applied to generate a set of vehicle-like regions from multi-feature maps of different hierarchies and scales. Because the multi-feature maps combine the advantage of the deep and shallow convolutional layer, the first network performs well on locating the small targets in aerial image data. Then, the generated candidate regions are fed into the second network for feature extraction and decision making. Comprehensive experiments are conducted on the Vehicle Detection in Aerial Imagery (VEDAI) dataset and Munich vehicle dataset. The proposed cascaded detection model yields high performance, not only in detection accuracy but also in detection speed.
Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR) sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG) for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: .
With the development of deep learning algorithms, more and more deep learning algorithms are being applied to remote sensing image classification, detection, and semantic segmentation. The landslide semantic segmentation of a remote sensing image based on deep learning mainly uses supervised learning, the accuracy of which depends on a large number of training data and high-quality data annotation. At this stage, high-quality data annotation often requires the investment of significant human effort. Therefore, the high cost of remote sensing landslide image data annotation greatly restricts the development of a landslide semantic segmentation algorithm. Aiming to resolve the problem of the high labeling cost of landslide semantic segmentation with a supervised learning method, we proposed a remote sensing landslide semantic segmentation with weakly supervised learning method combing class activation maps (CAMs) and cycle generative adversarial network (cycleGAN). In this method, we used the image level annotation data to replace pixel level annotation data as the training data. Firstly, the CAM method was used to determine the approximate position of the landslide area. Then, the cycleGAN method was used to generate the fake image without a landslide, and to make the difference with the real image to obtain the accurate segmentation of the landslide area. Finally, the pixel-level segmentation of the landslide area on remote sensing image was realized. We used mean intersection-over-union (mIOU) to evaluate the proposed method, and compared it with the method based on CAM, whose mIOU was 0.157, and we obtain better result with mIOU 0.237 on the same test dataset. Furthermore, we made a comparative experiment using the supervised learning method of a u-net network, and the mIOU result was 0.408. The experimental results show that it is feasible to realize landslide semantic segmentation in a remote sensing image by using weakly supervised learning. This method can greatly reduce the workload of data annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.