Building footprint extraction from high-resolution aerial images is always an essential part of urban dynamic monitoring, planning, and management. It has also been a challenging task in remote sensing research. In recent years, deep neural networks have made great achievement in improving the accuracy of building extraction from remote sensing imagery. However, most of the existing approaches usually require a large amount of parameters and floating point operations for high accuracy, it leads to high memory consumption and low inference speed which are harmful to research. In this paper, we proposed a novel efficient network named ESFNet which employs separable factorized residual block and utilizes the dilated convolutions, aiming to preserve slight accuracy loss with low computational cost and memory consumption. Our ESFNet obtains a better trade-off between accuracy and efficiency, it can run at over 100 FPS on single Tesla V100, requires 6x fewer FLOPs and has 18x fewer parameters than state-of-theart real-time architecture ERFNet while preserving similar accuracy without any additional context module, post-processing and pre-trained scheme. We evaluated our networks on WHU building dataset and compared it with other state-of-the-art architectures. The result and comprehensive analysis show that our networks are benefit for efficient remote sensing researches, and the idea can be further extended to other areas.
The segmentation of high-resolution (HR) remote sensing images is very important in modern society, especially in the fields of industry, agriculture and urban modelling. Through the neural network, the machine can effectively and accurately extract the surface feature information. However, using the traditional deep learning methods requires plentiful efforts in order to find a robust architecture. In this paper, we introduce a neural network architecture search (NAS) method, called NAS-HRIS, which can automatically search neural network architecture on the dataset. The proposed method embeds a directed acyclic graph (DAG) into the search space and designs the differentiable searching process, which enables it to learn an end-to-end searching rule by using gradient descent optimization. It uses the Gumbel-Max trick to provide an efficient way when drawing samples from a non-continuous probability distribution, and it improves the efficiency of searching and reduces the memory consumption. Compared with other NAS, NAS-HRIS consumes less GPU memory without reducing the accuracy, which corresponds to a large amount of HR remote sensing imagery data. We have carried out experiments on the WHUBuilding dataset and achieved 90.44% MIoU. In order to fully demonstrate the feasibility of the method, we made a new urban Beijing Building dataset, and conducted experiments on satellite images and non-single source images, achieving better results than SegNet, U-Net and Deeplab v3+ models, while the computational complexity of our network architecture is much smaller.
Deforestation in the Amazon rainforest results in reduced biodiversity, habitat loss, climate change, and other destructive impacts. Hence obtaining location information on human activities is essential for scientists and governments working to protect the Amazon rainforest. We propose a novel remote sensing image classification framework that provides us with the key data needed to more effectively manage deforestation and its consequences. We introduce the attention module to separate the features which are extracted from CNN(Convolutional Neural Network) by channel, then further send the separated features to the LSTM(Long-Short Term Memory) network to predict labels sequentially. Moreover, we propose a loss function by calculating the co-occurrence matrix of all labels in the dataset and assigning different weights to each label. Experimental results on the satellite image dataset of the Amazon rainforest show that our model obtains a better F 2 score compared to other methods, which indicates that our model is effective in utilizing label dependencies to improve the performance of multi-label image classification.
Building footprint extraction from high-resolution aerial images is always an essential part of urban dynamic monitoring, planning and management. It has also been a challenging task in remote sensing research. In recent years, deep neural networks have made great achievement in improving accuracy of building extraction from remote sensing imagery. However, most of existing approaches usually require large amount of parameters and floating point operations for high accuracy, it leads to high memory consumption and low inference speed which are harmful to research. In this paper, we proposed a novel efficient network named ESFNet which employs separable factorized residual block and utilizes the dilated convolutions, aiming to preserve slight accuracy loss with low computational cost and memory consumption. Our ESFNet obtains a better trade-off between accuracy and efficiency, it can run at over 100 FPS on single Tesla V100, requires 6x fewer FLOPs and has 18x fewer parameters than state-of-the-art real-time architecture ERFNet while preserving similar accuracy without any additional context module, post-processing and pre-trained scheme. We evaluated our networks on WHU Building Dataset and compared it with other state-of-the-art architectures. The result and comprehensive analysis show that our networks are benefit for efficient remote sensing researches, and the idea can be further extended to other areas. The code is public available at: https://github.com/mrluin/ESFNet-Pytorch
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.