Abstract:Abstract. Green roofs are increasingly popular measures to permanently reduce or delay storm-water runoff. The main objective of the study was to examine the potential of using machine learning (ML) to simulate runoff from green roofs to estimate their hydrological performance. Four machine learning methods, artificial neural network (ANN), M5 model tree, long short-term memory (LSTM) and k nearest neighbour (kNN), were applied to simulate storm-water runoff from 16 extensive green roofs located in four Norweg… Show more
“…This setup permitted simultaneous training and simulation over thousands of sites or more. However, in many other machine learning studies, following the conventional wisdom of stratification, geoscientists still tend to train separate models using data from each site (Duan et al., 2020; Herath et al., 2021; Petty & Dhingra, 2018), or each region composed of sites with similar environmental conditions (Abdalla et al., 2021; Sahoo et al., 2017).…”
When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing.
“…This setup permitted simultaneous training and simulation over thousands of sites or more. However, in many other machine learning studies, following the conventional wisdom of stratification, geoscientists still tend to train separate models using data from each site (Duan et al., 2020; Herath et al., 2021; Petty & Dhingra, 2018), or each region composed of sites with similar environmental conditions (Abdalla et al., 2021; Sahoo et al., 2017).…”
When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to stratify a large domain into multiple regions (or regimes) and study each region separately. Traditional wisdom suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, each stratified model has access to fewer and less diverse data points. Here, through two hydrologic examples (soil moisture and streamflow), we show that conventional wisdom may no longer hold in the era of big data and deep learning (DL). We systematically examined an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. The performance of the DL models benefited from modest diversity in the training data compared to a homogeneous training set, even with similar data quantity. Moreover, allowing heterogeneous training data makes eligible much larger training datasets, which is an inherent advantage of DL. A large, diverse data set is advantageous in terms of representing extreme events and future scenarios, which has strong implications for climate change impact assessment. The results here suggest the research community should place greater emphasis on data sharing.
“…The process of model calibration or dataset training was based on a backpropagation algorithm by adjusting the weights and computing the error between the output and the corresponding target value and propagating this backward through the network to adjust weights and produce the desired output (Abdalla et al, 2021). The optimum number of neurons in the hidden layers was based on a trial error process, starting with a small number of neurons, which gradually was increased until obtaining the lowest forecasting error (mean square error function).…”
“…25,31,37,55 In addition, in recent years, data-driven methods such as machine learning techniques have been investigated. 56 Yet, data-driven methods will not be discussed either, because field data scarcity is a common issue that managers and developers face to train and test models. Other software packages, such as MUSICX, 57 have previously been considered for green roof modeling, but as these are not as widely used and either require licenses for use or are not open source, they will not be discussed.…”
Section: Overview Of Gr Modelsmentioning
confidence: 99%
“…Stormwater control primarily relates to the mechanical process of water movement (infiltration) within GR substrate. 30,56,59 Simulation of soil water transport, thus, is Fig. 1 Components and water fluxes of a simplified GR model.…”
Section: Modeling Soil Water Transportmentioning
confidence: 99%
“…18,[34][35][36]78,92,94 The second strategy is crossvalidation, in which models are tested among different sites and climate forcings. 41,56,58 This strategy is becoming increasingly popular, because it is important for GR planning that the model can predict the performance of new implementation when data are unavailable. Nevertheless, the transferred model often fails to predict GR outflow at different sites.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.