Abstract. Tropospheric ozone is a toxic greenhouse gas with a highly variable spatial distribution which is challenging to map on a global scale. Here, we present a data-driven ozone-mapping workflow generating a transparent and reliable product. We map the global distribution of tropospheric ozone from sparse, irregularly placed measurement stations to a high-resolution regular grid using machine learning methods. The produced map contains the average tropospheric ozone concentration of the years 2010–2014 with a resolution of 0.1∘ × 0.1∘. The machine learning model is trained on AQ-Bench (“air quality benchmark dataset”), a pre-compiled benchmark dataset consisting of multi-year ground-based ozone measurements combined with an abundance of high-resolution geospatial data. Going beyond standard mapping methods, this work focuses on two key aspects to increase the integrity of the produced map. Using explainable machine learning methods, we ensure that the trained machine learning model is consistent with commonly accepted knowledge about tropospheric ozone. To assess the impact of data and model uncertainties on our ozone map, we show that the machine learning model is robust against typical fluctuations in ozone values and geospatial data. By inspecting the input features, we ensure that the model is only applied in regions where it is reliable. We provide a rationale for the tools we use to conduct a thorough global analysis. The methods presented here can thus be easily transferred to other mapping applications to ensure the transparency and reliability of the maps produced.
<p>The spatial impact of a single shallow landslide is small compared to a deep-seated, impactful failure and hence its damage potential localized and limited. Yet, their higher frequency of occurrence and spatio-temporal correlation in response to external triggering events such as strong precipitation, nevertheless result in dramatic risks for population, infrastructure and environment. It is therefore essential to continuously investigate and analyze the spatial hazard that shallow landslides pose. Its visualisation through regularly-updated, dynamic hazard maps can be used by decision and policy makers. Even though a number of data-driven approaches for shallow landslide hazard mapping exist, a generic workflow has not yet been described. Therefore, we introduce a scalable and modular machine learning-based workflow for shallow landslide hazard prediction in this study. The scientific test case for the development of the workflow investigates the rainfall-triggered shallow landslide hazard in Switzerland. A benchmark dataset was compiled based on a historic landslide database as presence data, as well as a pseudo-random choice of absence locations, to train the data-driven model. Features included in this dataset comprise at the current stage 14 parameters from topography, soil type, land cover and hydrology. This work also focuses on the investigation of a suitable approach to choose absence locations and the influence of this choice on the predicted hazard as their influence is not comprehensively studied. We aim at enabling time-dependent and dynamic hazard mapping by incorporating time-dependent precipitation data into the training dataset with static features. Inclusion of temporal trigger factors, i.e. rainfall, enables a regularly-updated landslide hazard map based on the precipitation forecast. Our approach includes the investigation of a suitable precipitation metric for the occurrence of shallow landslides at the absence locations based on the statistical evaluation of the precipitation behavior at the presence locations. In this presentation, we will describe the modular workflow as well as the benchmark dataset and show preliminary results including above mentioned approaches to handle absence locations and time-dependent data.</p>
<p>Through the availability of multi-year ground based ozone observations on a global scale, substantial geospatial meta data, and high performance computing capacities, it is now possible to use machine learning for a global data-driven ozone assessment. In this presentation, we will show a novel, completely data-driven approach to map tropospheric ozone globally.</p><p>Our goal is to interpolate ozone metrics and aggregated statistics from the database of the Tropospheric Ozone Assessment Report (TOAR) onto a global 0.1&#176; x 0.1&#176; resolution grid. &#160;It is challenging to interpolate ozone, a toxic greenhouse gas because its formation depends on many interconnected environmental factors on small scales. We conduct the interpolation with various machine learning methods trained on aggregated hourly ozone data from five years at more than 5500 locations worldwide. We use several geospatial datasets as training inputs to provide proxy input for environmental factors controlling ozone formation, such as precursor emissions and climate. The resulting maps contain different ozone metrics, i.e. statistical aggregations which are widely used to assess air pollution impacts on health, vegetation, and climate.</p><p>The key aspects of this contribution are twofold: First, we apply explainable machine learning methods to the data-driven ozone assessment. Second, we discuss dominant uncertainties relevant to the ozone mapping and quantify their impact whenever possible. Our methods include a thorough a-priori uncertainty estimation of the various data and methods, assessment of scientific consistency, finding critical model parameters, using ensemble methods, and performing error modeling.</p><p>Our work aims to increase the reliability and integrity of the derived ozone maps through the provision of scientific robustness to a data-centric machine learning task. This study hence represents a blueprint for how to formulate an environmental machine learning task scientifically, gather the necessary data, and develop a data-driven workflow that focuses on optimizing transparency and applicability of its product to maximize its scientific knowledge return.</p>
Abstract. Tropospheric ozone is a toxic greenhouse gas with a highly variable spatial distribution which is challenging to map on a global scale. Here we present a data-driven ozone mapping workflow generating a transparent and reliable product. We map the global distribution of tropospheric ozone from sparse, irregularly placed measurement stations to a high-resolution regular grid using machine learning methods. The produced map contains the average tropospheric ozone concentration of the years 2010–2014 with a resolution of 0.1° × 0.1°. The machine learning model is trained on AQ-Bench, a precompiled benchmark dataset consisting of multi-year ground-based ozone measurements combined with an abundance of high-resolution geospatial data. Going beyond standard mapping methods, this work focuses on two key aspects to increase the integrity of the produced map. Using explainable machine learning methods we ensure that the trained machine learning model is consistent with commonly accepted knowledge about tropospheric ozone. To assess the impact of data and model uncertainties on our ozone map, we show that the machine learning model is robust against typical fluctuations in ozone values and geospatial data. By inspecting the feature space, we ensure that the model is only applied in regions where it is reliable. We provide a rationale for the tools we use to conduct a thorough global analysis. The methods presented here can thus be easily transferred to other mapping applications to ensure the transparency and reliability of the maps produced.
<p>Extreme weather situations are becoming increasingly frequent with devastating consequences worldwide. Heavy rainfall events in July 2021 caused severe flash floods in western Germany, Belgium and the Netherlands, resulting in a high number of casualties and material damage. The high hazard potential combined with the low reaction times, associated with these events, make it necessary to develop efficient and reliable early warning systems (EWSs) to facilitate the preparation of response strategies. As nowcast precipitation forecasts are continuously improving in both quality and spatial resolution, they become an essential input for flash flood and landslide prediction models and therefore an important component in EWSs. However, the inherent uncertainty of radar-based nowcasting systems are carried over to the output of those prediction models. Therefore, this study aims to analyze the uncertainty sources of nowcasting products of the German weather service (DWD) using the July flood Event 2021 as a case study. More specifically, the objective is to determine whether the quality of precipitation nowcast products is sufficient for usage in physics-based flood or landslide prediction models. Due to the complex nature of weather and rainfall structures as well as their spatio-temporal variability, traditional cell-by-cell comparison of predictions and ground truth is insufficient to quantify forecast quality. To overcome this issue, uncertainties in magnitude, time and space and their respective sources are identified, using techniques from various fields of science. Subsequently, error propagation in flash flood prediction models is analyzed by applying the previously determined uncertainty ranges to a hydrological model.</p>
<p>Several powerful physics-based computational landslide run-out models have been developed and validated throughout the last years. The geohazards community applies these forward models in simulation tools to predict potential landslide run-out outcomes including their uncertainties, and uses inverse approaches to conduct reanalyses and to infer on model parameters for calibration purposes. Yet it remains challenging to turn these computational frameworks into robust, transparent and transferrable simulation-based decision support tools for geohazard mitigation. In particular, the landscape of uncertainties &#8211; such as those resulting from the idealised model description itself, input data (e.g., material parameters or topographic data), and numerical scheme related hyperparameters &#8211; is still not systematically managed when conducting landslide simulations. Probabilistic hazard maps that take these uncertainties into account imply a large number of model evaluations, which constitutes a computational bottle neck. This issue can be overcome by using High Performance Computing (HPC) resources along with the existing software and resources. Alternatively, physics-informed machine learning strategies use simulation results of the original process model, i.e., the simulator, to train a statistically valid representation, the so-called emulator. Once being trained, the emulator significantly reduces computational costs, while at the same time it grants access to an estimation of the introduced error. A software framework has recently been set up to integrate Gaussian process emulation and the landslide run-out model r.avaflow, an open-source mass flow simulation tool. Emulation-based sensitivity analysis was of comparable quality to conventional studies, and the computational costs were cut significantly. The emulator allowed for the first time to conduct a global sensitivity analyses at every location simultaneously for a complete landslide impact area. A joint effort across different institutes in Europe has been made in this contribution to test the potential and limitation of the emulation techniques by revisiting a number of published case studies. Selection of test cases has been made according to data availability, failure type and computational demand. Preliminary findings suggest that the emulator is capable of reducing the computational effort of modelling various flow-like landslides substantially. Future work will focus on curating a well-defined database of test scenarios across multiple institutes with cases ranging from small to medium-sized debris flows to large rock avalanches.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.