The monitoring and forecasting of environmental conditions is a task to which much effort and resources are devoted by the scientific community and relevant authorities. Representative examples arise in meteorology, oceanography, and environmental engineering. As a consequence, high volumes of data are generated, which include data generated by earth observation systems and different kinds of models. Specific data models, formats, vocabularies and data access infrastructures have been developed and are currently being used by the scientific community. Due to this, discovering, accessing and analyzing environmental datasets requires very specific skills, which is an important barrier for their reuse in many other application domains. This paper reviews earth science data representation and access standards and technologies, and identifies the main challenges to overcome in order to enable their integration in semantic open data infrastructures. This would allow non-scientific information technology practitioners to devise new end-user solutions for citizen problems in new application domains.
Both the increasing number of GPS-enabled mobile devices and the geographic crowdsourcing initiatives, such as Open Street Map, are determinants for the large amount of vector spatial data that is currently being produced. On the other hand, the automatic generation of raster data by remote sensing devices and environmental modeling processes was always leading to very large datasets. Currently, huge data generation rates are reached by improved sensor observation systems and data processing infrastructures. As an example, the Sentinel Data Access System of the Copernicus Program of the European Space Agency (ESA) was publishing 38.71 TB of data per day during 2020. This paper shows how the assumption of a new spatial data model that includes multi-resolution parametric spatial data types, enables achieving an efficient implementation of a large scale distributed spatial analysis system for integrated vector-raster data lakes. In particular, the proposed implementation outperforms the state-of-the-art Sparkbased spatial analysis systems by more than one order of magnitude during vector-raster spatial join evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.