Big data analysis assumes a significant role in Earth observation using remote sensing images, since the explosion of data images from multiple sensors is used in several fields. The traditional data analysis techniques have different limitations on storing and processing massive volumes of data. Besides, big remote sensing data analytics demand sophisticated algorithms based on specific techniques to store to process the data in real-time or in near real-time with high accuracy, efficiency, and high speed. In this paper, we present a method for storing a huge number of heterogeneous satellite images based on Hadoop distributed file system (HDFS) and Apache Spark. We also present how deep learning algorithms such as VGGNet and UNet can be beneficial to big remote sensing data processing for feature extraction and classification. The obtained results prove that our approach outperforms other methods.
Spaceborne and airborne sensors deliver a huge number of Earth Observation Data every day. In this context, we can easily observe the whole earth from its different sides. Therefore, this big data is important in remote sensing and could be exploited in several domains requiring image classification, natural hazard monitoring, global climate change, agriculture, urban planning. Over the last five years, Convolutional Neural Networks (CNN) emerged as the most successful technique for the image classification task, as well as a number of other computer vision tasks. However, to train millions of parameters in CNN one requires a huge amount of annotated data. This requirement leads to a significant challenge if the available training data is limited for a target task at hand. To address this challenge, in the recent literature, researchers proposed various ways to apply a technique called Transfer Learning to transfer the knowledge gained by training CNNs parameters on some large annotated dataset to the target task with limited availability of training data. Most of our work in this paper was dedicated to proposing a hybrid classification of remote sensing images. This architecture combines Spark RDD image coding to consider image's local regions, pre-trained VGGNET-16 and UNET for image segmentation and SVM (Support Vector Machines) from spark Machine Learning to achieve labeling task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.