Spatial neighboring analysis is an indispensable part of geo-raster spatial analysis. In the big data era, high-resolution raster data offer us abundant and valuable information, and also bring enormous computational challenges to the existing focal statistics algorithms. Simply employing the in-memory computing framework Spark to serve such applications might incur performance issues due to its lack of native support for spatial data. In this article, we present a Spark-based parallel computing approach for the focal algorithms of neighboring analysis. This approach implements efficient manipulation of large amounts of terrain data through three steps: (1) partitioning a raster digital elevation model (DEM) file into multiple square tile files by adopting a tile-based multifile storing strategy suitable for the Hadoop Distributed File System (HDFS), (2) performing the quintessential slope algorithm on these tile files using a dynamic calculation window (DCW) computing strategy, and (3) writing back and merging the calculation results into a whole raster file. Experiments with the digital elevation data of Australia show that the proposed computing approach can effectively improve the parallel performance of focal statistics algorithms. The results also show that the approach has almost the same calculation accuracy as that of ArcGIS. The proposed approach also exhibits good scalability when the number of Spark executors in clusters is increased.
Viewshed analysis is an indispensable part of digital terrain analysis and widely used in many application domains. High-resolution raster digital elevation model (DEM) data bring significant computational challenges to the existing viewshed analysis algorithms, which are computationally intensive and require a large memory space and massive computing power. The visibility calculation can be accelerated using Apache Spark. In this article, we present a Spark-based parallel computing approach for the XDraw algorithm, which is composed of a tile-based raster data storing strategy, an equivolume computing strategy, and a streammerging write-back strategy. The parallel implementation of the XDraw algorithm mainly consists of three parts: partitioning a raster DEM file into square tile sets and reorganizing these tile sets to prevent tile overlap across data divisions of Hadoop Distributed File System, subdividing the DEM into multiple equivolume data sectors according to the viewpoint position, and performing the XDraw algorithm on the corresponding tile sets of each sector independently and writing back the viewshed results efficiently. Experiments on real-world datasets show that the proposed computing approach can achieve higher speedup and efficiency for XDraw viewshed analysis as the raster DEM data volume is dramatically increased. The results also show that the approach has also satisfactory scalability as the number of data nodes in clusters is increased.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.