Abstract. Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model–data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model–data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model–data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries.
<p>With the increasing volume of information from satellites observing Earth, the technical and methodological prerequisites of users in science and applications are becoming more demanding and complex for generating demand-driven products while exploiting the full potential of large Earth observation (EO) data archives. Since 2014, the European Space Agency (ESA) is addressing this challenge with the concept of Thematic Exploitation Platforms (TEPs), aiming to create an ecosystem of interconnected platforms providing thematic EO-based data and services for currently seven thematic sectors.</p><p>The built-environment and urban sector is addressed with the Urban Thematic Exploitation Platform (UrbanTEP; urban-tep.eu), acknowledging that urbanization and sustainable settlement growth are key global challenges. The linkages to socio-economic development, health, environment, greenhouse gas emissions, climate change and other sectors are deep and multi-faceted. EO based services and resulting information products and other spatial datasets have successfully found their way into planning and decision-making processes that address the urban ecosystem. While a range of downstream services are based on solitary and effortful processing and visualization solutions, the platform-based approach has proven to be a game changing technology, being capable of revolutionizing service provision, workflows and information products.</p><p>UrbanTEP is a collaborative system, which focuses on EO data provision, processing and other spatial products for delivering multi-source information on trans-sectoral urban challenges on various scales. It is developed to provide end-to-end and ready-to-use solutions for a wide spectrum of users in the public and private sector. The core system components are an open, web-based portal connected to distributed and scalable high-level computing infrastructures and providing key functionalities for:</p><ul><li>high-performance data access and processing (IaaS &#8211; Infrastructure as a Service),</li> <li>modular and generic state-of-the art pre-processing, analysis, and visualization tools and algorithms (SaaS &#8211; Software as a Service),</li> </ul><ul><li>customized development and sharing of algorithms, products and services (PaaS &#8211; Platform as a Service), and</li> <li>networking and communication.</li> </ul><p>The facilitation of EO service acceptance and uptake by the urban community, as well as the onboarding of third-party service providers are essential to PaaS solutions. UrbanTEP is therefore in the process of expanding the range of service solutions and the interconnection with other service providers. The concept of &#8220;City Data Cubes&#8221; is introduced for urban use cases and algorithm hosting capabilities (&#8220;algo-as-as-service&#8221; functionalities) are improved by adopting the OGC Common Architecture standard. In addition, the data analytics and visualization capabilities of UrbanTEP provide functionalities for a user-driven derivation of key urban indicators based on the above-mentioned multi-source data collections. The provision of premium urban information products, like the World Settlement Footprint (WSF) outlining built-up areas globally, allows users and service providers to derive customized demand-driven EO-based products.</p>
<p>Compound heat waves and drought events draw our particular attention as they become more frequent. Co-occurring extreme events often exacerbate impacts on ecosystems and can induce a cascade of detrimental consequences. However, the research to understand these events is still in its infancy. DeepExtremes is a project funded by the European Space Agency (https://rsc4earth.de/project/deepextremes/) aiming at using deep learning to gain insight into Earth surface under extreme climate conditions. Specifically, the goal is to forecast and explain extreme, multi-hazard, and compound events. To this end, the project leverages the existing Earth observation archive to help us better understand and represent different types of hazards and their effects on society and vegetation. The project implementation involves a multi-stage process consisting of 1) global event detection; 2) intelligent subsampling and creation of mini-data-cubes; 3) forecasting methods development, interpretation, and testing; and 4) cloud deployment and upscaling. The data products will be made available to the community following the reproducibility and FAIR data principles. By effectively combining Earth system science with explainable AI, the project contributes knowledge to advancing the sustainable management of consequences of extreme events. This presentation will show the progress made so far and specifically introduce how to participate in the challenges about spatio-temporal extreme event prediction in DeepExtremes.</p>
<p>The Deep Earth System Data Lab (DeepESDL, https://earthsystemdatalab.net) provides an AI-ready, collaborative environment enabling researchers to understand the complex dynamics of the Earth System using numerous datasets and multi-variate, empirical approaches. The solution builds on work done in previous projects funded by the European Space Agency (CAB-LAB and ESDL), which established the technical foundations and created measurable value for the scientific community, e.g., Mahecha et al. (2020, https://doi.org/10.5194/esd-11-201-2020) or Flach et al. (2018, https://doi.org/10.5194/bg-15-6067-2018 ). DeepESDL relies heavily on the well-established open-source technology stacks for data science in Python, thus ensuring usability and compatibility.</p> <p>The core of the DeepESDL is represented by the provision of programmatic access to various data sources in analysis-ready form, organised in data cubes combined with adequate computational resources and capabilities to allow researchers to immediately focus on efficient analysis and of multi-variate and high-dimensional data through empirical methods or AI approaches.&#160;</p> <p>To ensure proper documentation and discoverability, DeepESDL is building an informative catalogue to find all available data and to find the required metainformation describing them. This includes not only standard information, e.g., regarding spatial and temporal coverage, versioning, but also on specific transformation methods applied during data cube generation.</p> <p>The system design has openness, collaboration, and dissemination as key guiding principles. As science teams need proper tooling support to efficiently work together in this virtual environment, one of the key elements of the architecture is represented by the DeepESDL Hub, providing teams of scientific users with the means for collaboration and exchange of versioned results, source codes, models, execution parameters, and other artifacts and outcomes of their activities in a simple, safe and reliable way. The tools are complemented by an integrated, state-of-the-art application for the visualisation of all data in the virtual laboratory including input data, intermediate results, as well as the final products.</p> <p>Furthermore, the DeepESDL supports the implementation and execution of Machine Learning workflows on Analysis Ready Data Cubes in a reproducible and FAIR way, allowing sharing and versioning of all ML artifacts like code, data, models, execution parameters, metrics, and results as well as tracking each step in the ML workflows (supported by integration with Open-Source tools like TensorBoard or Mlflow) for an experiment so that others can reproduce them and contribute.</p> <p>Finally, dissemination is essential for the Open Science spirit of the DeepESDL. Two applications, xcube Viewer and 4D viewer, offer comprehensive user interfaces for interactive exploration of multi-variate data cubes. Both use the same RESTful data service API provided by xcube Server. The latter also provides OGC interfaces, so that other OGC-compliant applications, such as QGIS3, are able to visualise analysis-ready data cubes generated within DeepESDL.</p> <p>To foster collaboration, additional features such as publishing individual Jupyter Notebooks as storytelling documents or even books using Jupyter Books or the Executable Book Project are being explored, together with concepts such as storytelling and DeepESDL User Project Dashboards which may also link to the viewers and Notebooks.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.