This paper synthesizes multiple methods for machine learning (ML) model interpretation and visualization (MIV) focusing on meteorological applications. ML has recently exploded in popularity in many fields, including meteorology. Although ML has been successful in meteorology, it has not been as widely accepted, primarily due to the perception that ML models are “black boxes,” meaning the ML methods are thought to take inputs and provide outputs but not to yield physically interpretable information to the user. This paper introduces and demonstrates multiple MIV techniques for both traditional ML and deep learning, to enable meteorologists to understand what ML models have learned. We discuss permutation-based predictor importance, forward and backward selection, saliency maps, class-activation maps, backward optimization, and novelty detection. We apply these methods at multiple spatiotemporal scales to tornado, hail, winter precipitation type, and convective-storm mode. By analyzing such a wide variety of applications, we intend for this work to demystify the black box of ML, offer insight in applying MIV techniques, and serve as a MIV toolbox for meteorologists and other physical scientists.
High-impact weather events, such as severe thunderstorms, tornadoes, and hurricanes, cause significant disruptions to infrastructure, property loss, and even fatalities. High-impact events can also positively impact society, such as the impact on savings through renewable energy. Prediction of these events has improved substantially with greater observational capabilities, increased computing power, and better model physics, but there is still significant room for improvement. Artificial intelligence (AI) and data science technologies, specifically machine learning and data mining, bridge the gap between numerical model prediction and real-time guidance by improving accuracy. AI techniques also extract otherwise unavailable information from forecast models by fusing model output with observations to provide additional decision support for forecasters and users. In this work, we demonstrate that applying AI techniques along with a physical understanding of the environment can significantly improve the prediction skill for multiple types of high-impact weather. The AI approach is also a contribution to the growing field of computational sustainability. The authors specifically discuss the prediction of storm duration, severe wind, severe hail, precipitation classification, forecasting for renewable energy, and aviation turbulence. They also discuss how AI techniques can process “big data,” provide insights into high-impact weather phenomena, and improve our understanding of high-impact weather.
This paper describes the use of convolutional neural nets (CNN), a type of deep learning, to identify fronts in gridded data, followed by a novel postprocessing method that converts probability grids to objects. Synoptic-scale fronts are often associated with extreme weather in the midlatitudes. Predictors are 1000-mb (1 mb = 1 hPa) grids of wind velocity, temperature, specific humidity, wet-bulb potential temperature, and/or geopotential height from the North American Regional Reanalysis. Labels are human-drawn fronts from Weather Prediction Center bulletins. We present two experiments to optimize parameters of the CNN and object conversion. To evaluate our system, we compare the objects (predicted warm and cold fronts) with human-analyzed warm and cold fronts, matching fronts of the same type within a 100- or 250-km neighborhood distance. At 250 km our system obtains a probability of detection of 0.73, success ratio of 0.65 (or false-alarm rate of 0.35), and critical success index of 0.52. These values drastically outperform the baseline, which is a traditional method from numerical frontal analysis. Our system is not intended to replace human meteorologists, but to provide an objective method that can be applied consistently and easily to a large number of cases. Our system could be used, for example, to create climatologies and quantify the spread in forecast frontal properties across members of a numerical weather prediction ensemble.
Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.
We introduce an efficient approach to mining multi-dimensional temporal streams of real-world data for ordered temporal motifs that can be used for prediction. Since many of the dimensions of the data are known or suspected to be irrelevant, our approach first identifies the salient dimensions of the data, then the key temporal motifs within each dimension, and finally the temporal ordering of the motifs necessary for prediction. For the prediction element, the data are assumed to be labeled. We tested the approach on two real-world data sets. To verify the generality of the approach, we validated the application on several subjects from the CMU Motion Capture database. Our main application uses several hundred numerically simulated supercell thunderstorms where the goal is to identify the most important features and feature interrelationships which herald the development of strong rotation in the lowest altitudes of a storm. We identified sets of precursors, in the form of meteorological Responsible editor:123 Identifying predictive multi-dimensional time series motifs 233 quantities reaching extreme values in a particular temporal sequence, unique to storms producing strong low-altitude rotation. The eventual goal is to use this knowledge for future severe weather detection and prediction algorithms.
Although NEXRAD radars have proven to be an effective tool for detecting airborne animals, detecting biological phenomena in radar images often involves a manual, time-consuming data-extraction process. This paper focuses on applying machine learning to automatically find radar data that snapshots large aggregations of birds (specifically Purple Martins and Tree Swallows) as they depart en masse from roosting sites. These aggregations are evident in radar images as rings of elevated reflectivity that appear early in the morning as birds depart from roost sites. Our goal was to develop an algorithm that could determine whether an individual radar image contained at least one Purple Martin or Tree Swallow roost. We use a dataset of known roost locations to train three machine learning algorithms that employed (1) a traditional Artificial Neural Network (ANN), (2) a sophisticated preexisting Convolutional Neural Network (CNN) called Inception-v3, and (3) a shallow CNN built from scratch. The resulting programs were all effective at finding bird roosts, with both the shallow CNN and the Inception-v3 network making correct determinations about 90 per cent of the time with an AUC above .9. To the best of our knowledge, this study is the first to apply neural networks in the analysis of bird roosts in radar imagery, and these analytical tools offer new avenues of research into the ecology and behavior of flying animals, with practical applications to wind farm placement, air traffic administration and wildlife conservation. The NEXRAD radar network offers a tremendous archive of continental-scale data and has the potential to capture entire vertebrate populations. We apply existing machine learning models to a new dataset which constitutes a valuable approach to extracting information from this archive.
Thunderstorms in the United States cause over 100 deaths and $10 billion (U.S. dollars) in damage per year, much of which is attributable to straight-line (nontornadic) wind. This paper describes a machine-learning system that forecasts the probability of damaging straight-line wind (≥50 kt or 25.7 m s−1) for each storm cell in the continental United States, at distances up to 10 km outside the storm cell and lead times up to 90 min. Predictors are based on radar scans of the storm cell, storm motion, storm shape, and soundings of the near-storm environment. Verification data come from weather stations and quality-controlled storm reports. The system performs very well on independent testing data. The area under the receiver operating characteristic (ROC) curve ranges from 0.88 to 0.95, the critical success index (CSI) ranges from 0.27 to 0.91, and the Brier skill score (BSS) ranges from 0.19 to 0.65 (>0 is better than climatology). For all three scores, the best value occurs for the smallest distance (inside storm cell) and/or lead time (0–15 min), while the worst value occurs for the greatest distance (5–10 km outside storm cell) and/or lead time (60–90 min). The system was deployed during the 2017 Hazardous Weather Testbed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.