Terra Populus’ architecture for integrated big geospatial services

Haynes, David; Manson, Steven M.; Shook, Eric

doi:10.1111/tgis.12286

Cited by 13 publications

(22 citation statements)

References 32 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The geospatial big data distributed cluster solution is built using the open-source projects Greenplum and PostGIS [47]. Greenplum's architecture ( Figure 2) uses massively parallel processing (MPP).…”

Section: Data Clusteringmentioning

confidence: 99%

BiGeo: A Foundational PaaS Framework for Efficient Storage, Visualization, Management, Analysis, Service, and Migration of Geospatial Big Data—A Case Study of Sichuan Province, China

Liu

Hao

Yang

2019

IJGI

View full text Add to dashboard Cite

With the rapid development of big data, numerous industries have turned their focus from information research and construction to big data technologies. Earth science and geographic information systems industries are highly information-intensive, and thus there is an urgent need to study and integrate big data technologies to improve their level of information. However, there is a large gap between existing big data and traditional geographic information technologies. Owing to certain characteristics, it is difficult to quickly and easily apply big data to geographic information technologies. Through the research, development, and application practices achieved in recent years, we have gradually developed a common geospatial big data solution. Based on the formation of a set of geospatial big data frameworks, a complete geospatial big data platform system called BiGeo was developed. Through the management and analysis of massive amounts of spatial data from Sichuan Province, China, the basic framework of this platform can be better utilized to meet our needs. This paper summarizes the design, implementation, and experimental experience of BiGeo, which provides a new type of solution to the research and construction of geospatial big data.

show abstract

Section: Data Clusteringmentioning

confidence: 99%

BiGeo: A Foundational PaaS Framework for Efficient Storage, Visualization, Management, Analysis, Service, and Migration of Geospatial Big Data—A Case Study of Sichuan Province, China

Liu

Hao

Yang

2019

IJGI

View full text Add to dashboard Cite

show abstract

“…PostGIS allows for the storage and analysis of both vector and raster data types. Haynes et al (2017) discusses some of the obstacles encountered with high-performance computing using both spatial data types.…”

Section: Ipums-terramentioning

confidence: 99%

“…Raster summarizations were initially degrading the performance of our system as they are computationally expensive. Haynes (2017) discusses why query performance times of raster analyses with PostgreSQL vary greatly and we have implemented a data caching system to reduce additional calculations. The caching key is generated from the following variables: geographic level, temporal time point or range, and the raster variable and raster operation (e.g., minimum value, maximum value).…”

Section: Ipums-terra Solutions For Big Data Integrationmentioning

confidence: 99%

IPUMS-Terra: integrated big heterogeneous spatiotemporal data analysis system

2018

Self Cite

View full text Add to dashboard Cite

Big Geo Data promises tremendous benefits to the GIS Science community in particular and the broader scientific community in general, but has been primarily of use to the relatively small body of GI-Scientists who possess the specialized knowledge and methods necessary for working with this class of data. Much of the greater scientific community is not equipped with the expert knowledge and techniques necessary to fully take advantage of the promise of big spatial data. IPUMS-Terra provides integrated spatiotemporal data to these scholars by simplifying access to thousands of raster and vector datasets, integrating them and providing them in formats that are useable to a broad array of research disciplines. IPUMS-Terra exemplifies a new class of National Spatial Data Infrastructure because it connects a large spatial data repository to advanced computational resources, allowing users to access the needle of information they need from the haystack of big spatial data. The project is trailblazing in its commitment to the open sharing of spatial data and spatial tool development, including describing its architecture, process development workflows, and openly sharing its products for the use general use of the scientific community.

show abstract

“…This paper studies the zonal statistics problem which combines raster data, e.g., temperature, with vector data, e.g., city boundaries, to compute aggregate values for each polygon, e.g., average temperature in each city. This problem has several applications including the study by ecologists on the effect of vegetation and temperature on human settlement [3,4], analyzing terabytes of socio-economic and environmental data [5,6], and studying of land use and land cover classification [7]. It can also be used for areal interpolation [8] and to assess the risk of wildfires [9].…”

Section: Introductionmentioning

confidence: 99%

“…Traditional methods to process the zonal statistics problem focused on either vectorizing the raster dataset [16] or rasterizing the vector data [5]. The first approach converts each pixel to a point and then runs a spatial join with polygons using a point-in-polygon predicate [16].…”

Section: Introductionmentioning

confidence: 99%

Distributed zonal statistics of big raster and vector data

Singla

Eldawy

2018

Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

View full text Add to dashboard Cite

Recent advancements in remote sensing technology have resulted in petabytes of data in raster format. This data is often processed in combination with high resolution vector data that represents, for example, city boundaries. One of the common operations that combine big raster and vector data is the zonal statistics which computes some statistics for each polygon in the vector dataset. This paper models the zonal statistics problem as a join problem and proposes a novel distributed system that can scale to petabytes of raster and vector data. The proposed method does not require any preprocessing or indexing which makes it perfect for ad-hoc queries that scientists usually want to run. We devise a theoretical cost model that proves the efficiency of our algorithm over the baseline method. Furthermore, we run an extensive experimental evaluation on large scale satellite data with up-to a trillion pixels, and big vector data with up-to hundreds of millions of edges, and we show that our method can perfectly scale to big data with up-to two orders of magnitude performance gain over Rasdaman and Google Earth Engine. This paper models the zonal statistics problem as a join problem as it needs to find the pixels in the raster layer that overlap the polygons in the vector layer. We explain, with a support of a theoretical analysis, that existing approaches are analogous to two common join algorithms, namely, index nested-loop join and hash join. This analogy highlights the main limitations of these two algorithms as the data size increases. Therefore, we propose a novel distributed algorithm, termed Raptor Zonal Statistics, which resembles the sort-merge join for big raster and vector data. We show

show abstract

Terra Populus’ architecture for integrated big geospatial services

Cited by 13 publications

References 32 publications

BiGeo: A Foundational PaaS Framework for Efficient Storage, Visualization, Management, Analysis, Service, and Migration of Geospatial Big Data—A Case Study of Sichuan Province, China

BiGeo: A Foundational PaaS Framework for Efficient Storage, Visualization, Management, Analysis, Service, and Migration of Geospatial Big Data—A Case Study of Sichuan Province, China

IPUMS-Terra: integrated big heterogeneous spatiotemporal data analysis system

Distributed zonal statistics of big raster and vector data

Contact Info

Product

Resources

About