Recent advancements in remote sensing technology have resulted in petabytes of data in raster format. This data is often processed in combination with high resolution vector data that represents, for example, city boundaries. One of the common operations that combine big raster and vector data is the zonal statistics which computes some statistics for each polygon in the vector dataset. This paper models the zonal statistics problem as a join problem and proposes a novel distributed system that can scale to petabytes of raster and vector data. The proposed method does not require any preprocessing or indexing which makes it perfect for ad-hoc queries that scientists usually want to run. We devise a theoretical cost model that proves the efficiency of our algorithm over the baseline method. Furthermore, we run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of edges, and we show that our method can perfectly scale to big data with up-to two orders of magnitude performance gain over Rasdaman and Google Earth Engine. This paper models the zonal statistics problem as a join problem as it needs to find the pixels in the raster layer that overlap the polygons in the vector layer. We explain, with a support of a theoretical analysis, that existing approaches are analogous to two common join algorithms, namely, index nested-loop join and hash join. This analogy highlights the main limitations of these two algorithms as the data size increases. Therefore, we propose a novel distributed algorithm, termed Raptor Zonal Statistics, which resembles the sort-merge join for big raster and vector data. We show
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.