New sources of geospatial data, such as the Internet of Things (IoT), Volunteered Geographic Information (VGI), and Open Geospatial Data, are becoming increasingly popular. This shift creates a demand for new ways to collect, manage, store, and analyse geospatial data. These challenges are mirrored in the general computer science concept of big data, a term describing datasets that are too large to be managed and processed by traditional technologies [1]. Laney [2] characterizes big data using the 3 Vs; Volume, Velocity, and Variety. These properties relate to geospatial data as well. Massive geospatial datasets originating from sensors are characterized by both high Volume and high Velocity, and open geospatial datasets from disparate sources comes with a high degree of Variety. This means that geospatial big data can be treated as a subset of big data, and opens up the possibility of using big data techniques to handle geospatial data [3, 4]. NoSQL (or Not Only SQL) data stores is one proposed solution to some of the challenges posed by big data. These data stores offer ways to handle the 3 Vs utilizing new techniques and architectures.
Diffs, a concept known from source code version control systems such as git, is interesting for geospatial, event-based workflows. We investigate how the native mathematical structure of vector geometries can be utilized in order to create a diffing algorithm tailored to geospatial vector data. Diffing algorithms are a well-researched area which dates to the 1970ies; however, we find that geospatial diffing operations tends to be carried out using generic algorithms combined with a pre- and post-processing step. We created GeomDiff, an algorithm and storage format tailored to geospatial vector data. The creation time, apply/undo time, and patch size of GeomDiff was compared to three other generic algorithms by running an online experiment using 2.5 million real-world geometry pairs from OpenStreetMap. We found that the GeomDiff algorithm performs better than or on-par with the alternatives on point-geometries, and complex geometries with a small (< 500) vertex count. We argue that there are both computation time and storage space improvements to be gained by using a tailored diffing algorithm for geospatial vector data. These promising first results encourages further refinement of the algorithm in order to handle complex geometries efficiently as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.