Meteorological records, including precipitation, commonly have missing values. Accurate imputation of missing precipitation values is challenging, however, because precipitation exhibits a high degree of spatial and temporal variability. Data-driven spatial interpolation of meteorological records is an increasingly popular approach in which missing values at a target station are imputed using synchronous data from reference stations. The success of spatial interpolation depends on whether precipitation records at the target station are strongly correlated with precipitation records at reference stations. However, the need for reference stations to have complete datasets implies that stations with incomplete records, even though strongly correlated with the target station, are excluded. To address this limitation, we develop a new sequential imputation algorithm for imputing missing values in spatio-temporal daily precipitation records. We demonstrate the benefits of sequential imputation by incorporating it within a spatial interpolation based on a Random Forest technique. Results show that for reliable imputation, having a few strongly correlated references is more effective than having a larger number of weakly correlated references. Further, we observe that sequential imputation becomes more beneficial as the number of stations with incomplete records increases. Overall, we present a new approach for imputing missing precipitation data which may also apply to other meteorological variables.
Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to adequately constrain watershed models. However, imputing missing extremes is more challenging than other parts of a hydrograph. To impute missing subhourly data, including extremes, within GW time-series data collected at multiple wells in the East River watershed, located in southwestern Colorado, we consider a single-well imputation (SWI) and a multiple-well imputation (MWI) approach. SWI gap-fills missing GW entries in a well using the same well's time-series data; MWI gap-fills a specific well's missing GW entry using the time series of neighboring wells. SWI takes advantage of linear interpolation and random forest (RF) approaches, whereas MWI exploits only the RF approach. We also use an information entropy framework to develop insights into how missing data patterns impact imputation. We discovered that if gaps were at random intervals, SWI could accurately impute up to 90% of missing data over an approximately two-year period. Contiguous gaps constituted more complex scenarios for imputation and required the use of MWI. Information entropy suggested that if gaps were contiguous, up to 50% of missing GW data could be estimated accurately over an approximately two-year period. The RF-feature importance suggested that a time feature (months) and a space feature (neighboring wells) were the most important predictors in the SWI and MWI. We also noted that neither SWI nor MWI methods could capture the missing extremes of a hydrograph. To counter this, we developed a new sequential approach and demonstrated the imputation of missing extremes in a GW time series with high accuracy.
Abstract. Meteorological forcing plays a critical role in accurately simulating the watershed hydrological cycle. With the advancement of high-performance computing and the development of integrated watershed models, simulating the watershed hydrological cycle at high temporal (hourly to daily) and spatial resolution (tens of meters) has become efficient and computationally affordable. These hyperresolution watershed models require high resolution of meteorological forcing as model input to ensure the fidelity and accuracy of simulated responses. In this study, we utilized the Advanced Terrestrial Simulator (ATS), an integrated watershed model, to simulate surface and subsurface flow and land surface processes using unstructured meshes at the Coal Creek Watershed near Crested Butte (Colorado). We compared simulated watershed hydrologic responses including streamflow and distributed variables such as evapotranspiration, snow water equivalent (SWE), and groundwater table driven by three publicly available, gridded meteorological forcings (GMFs) – Daily Surface Weather and Climatological Summaries (Daymet), the Parameter-elevation Regressions on Independent Slopes Model (PRISM), and the North American Land Data Assimilation System (NLDAS). By comparing various spatial resolutions (ranging from 400 m to 4 km) of PRISM, the simulated streamflow only becomes marginally worse when spatial resolution of meteorological forcing is coarsened to 4 km (or 30 % of the watershed area). However, the 4 km-resolution has much worse performance than finer resolution in spatially distributed variables such as SWE. Using the temporally disaggregated PRISM, we compared models forced by different temporal resolutions (hourly to daily), and sub-daily resolution preserves the dynamic watershed responses (e.g., diurnal fluctuation of streamflow) that are absent in results forced by daily resolution. Conversely, the simulated streamflow shows better performance using daily resolution compared to that using sub-daily resolution. Our findings suggest that the choice of GMF and its spatiotemporal resolution depends on the quantity of interest and its spatial and temporal scale, which may have important implications for model calibration and watershed management decisions.
An accurate characterization of the water content of snowpack, or snow water equivalent (SWE), is necessary to quantify water availability and constrain hydrologic and land-surface models. Recently, airborne observations (e.g., lidar) have emerged as a promising method to accurately quantify SWE at high resolutions (scales of ∼100m and finer). However, the frequency of these observations is very low, typically once or twice per season in Rocky Mountains, Colorado. Here, we present a machine learning framework based on Random Forests to model temporally sparse lidar-derived SWE, enabling estimation of SWE at unmapped time points. We approximated the physical processes governing snow accumulation and melt as well as snow characteristics by obtaining fifteen different variables from gridded estimates of precipitation, temperature, surface reflectance, elevation, and canopy. Results showed that in the Rocky Mountains of Colorado, our framework is capable of modeling SWE with a higher accuracy when compared with estimates generated by the Snow Data Assimilation System (SNODAS). The mean value of the coefficient of determination (R2) using our approach was 0.57 and the root mean squared error (RMSE) was 13 cm, which was a significant improvement over SNODAS (mean R2 = 0.13, RMSE = 20 cm). We explored the relative importance of the input variables, and observed that at the spatial resolution of 800 m, meteorological variables are more important drivers of predictive accuracy than surface variables which characterize the properties of snow on the ground. This research provides a framework to expand the applicability of lidar-derived SWE to unmapped time points.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.