Models of water resources systems are conceived to capture the underlying environmental dynamics occurring within watersheds. All such models can be regarded as working hypotheses, differing in the aspects of process representation and conceptualization. Most of the associated efforts in the water resources research community is dedicated to development of new models that perform well under specific atmospheric conditions and catchment properties. In this context, flexible modeling frameworks are gaining importance as they facilitate the model building process by providing the model building blocks, whereby the hydrologist is free to assemble the model for task at hand. Such flexible models have high degree of transferability, which in turn aid in progressing toward a unified hydrological theory at catchment scale. However, in cases without sufficient insights regarding a catchment characteristics and/or lack of expert's knowledge, one may have to try a large number of model configurations based on available model building blocks to construct an appropriate model for the catchment of interest. Undoubtedly, this may be time consuming and computationally intensive. This paper proposes a novel model building algorithm, which uses the full potential of flexible modeling frameworks by searching the model space and inferring suitable model configurations relying on machine learning. Proposed machine learning algorithm is based on evolutionary computation approach using genetic programming (GP). State-of-art GP applications in rainfall-runoff modeling so far used the algorithm as a short-term forecasting tool that generates an expected future time series very similar to neural networks application. In this case, the proposed algorithm develops a physically meaningful rainfall-runoff model. Although at the moment we learn models using two flexible modeling frameworks (SUPERFLEX and FUSE), the model induction toolkit can be armed with any internal coherence building blocks. The model induction capabilities of the proposed framework have been evaluated on the Blackwater River basin, Alabama, United States. The model configurations evolved through the model induction toolkit are consistent with the fieldwork investigations and previously reported research findings. Fixed Models Versus Flexible ModelsDesign of conceptual models traditionally begins with a perceptual model derived from insights gained on the basis of fieldwork and experience, proceeding through a mathematical formulation of the hypothesized structure to the numerically robust implementation in a computer code . Conceptual hydrological modeling can be broadly classified into single-and multiple-hypothesis (often referred to as flexible) modeling approaches.Development of models with fixed structure is based on the identification of a general model structure that is physically realistic and applicable to a reasonably wide range of catchments and climatic conditions. Several
Abstract. Despite showing great success of applications in many commercial fields, machine learning and data science models generally show limited success in many scientific fields, including hydrology (Karpatne et al., 2017). The approach is often criticized for its lack of interpretability and physical consistency. This has led to the emergence of new modelling paradigms, such as theory-guided data science (TGDS) and physics-informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on genetic programming (GP), namely the Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI) toolkit. ML-RR-MI is capable of developing fully fledged lumped conceptual rainfall–runoff models for a watershed of interest using the building blocks of two flexible rainfall–runoff modelling frameworks. In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall–runoff models. The meaningfulness and reliability of hydrological inferences gained from lumped models may tend to deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties is significant. This was the motivation behind developing our machine learning approach for distributed rainfall–runoff modelling titled Machine Induction Knowledge Augmented – System Hydrologique Asiatique (MIKA-SHA). MIKA-SHA captures spatial variabilities and automatically induces rainfall–runoff models for the catchment of interest without any explicit user selections. Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA's model induction capabilities have been tested on the Rappahannock River basin near Fredericksburg, Virginia, USA. MIKA-SHA builds and tests many model configurations using the model building components of the two flexible modelling frameworks and quantitatively identifies the optimal model for the watershed of concern. In this study, MIKA-SHA is utilized to identify two optimal models (one from each flexible modelling framework) to capture the runoff dynamics of the Rappahannock River basin. Both optimal models achieve high-efficiency values in hydrograph predictions (both at catchment and subcatchment outlets) and good visual matches with the observed runoff response of the catchment. Furthermore, the resulting model architectures are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists. MIKA-SHA-induced semi-distributed model performances were compared against existing lumped model performances for the same basin. MIKA-SHA-induced optimal models outperform the lumped models used in this study in terms of efficiency values while benefitting hydrologists with more meaningful hydrological inferences about the runoff dynamics of the Rappahannock River basin.
Abstract. Despite showing a great success of applications in many commercial fields, machine learning and data science models in general, show a limited use in scientific fields including hydrology. The approach is often criticized for lack of interpretability and physical consistency. This has led to the emergence of new paradigms, such as Theory Guided Data Science (TGDS) and physics informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles, in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on Genetic Programming (GP) namely Machine Learning Rainfall-Runoff Model Induction Toolkit (ML-RR-MI). ML-RR-MI is cable of developing fully-fledged lumped conceptual rainfall-runoff models for a watershed of interest using the building blocks of two flexible rainfall-runoff modelling frameworks (FUSE and SUPERFLEX). In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall-runoff models. This effort is motivated by the desire to address the decreasing meaningfulness of lumped models which tend to particularly deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties are significant. Henceforth, our machine learning approach for rainfall-runoff modelling titled Machine Induction Knowledge-Augmented System Hydrologique Asiatique (MIKA-SHA) captures spatial variabilities and automatically induces rainfall-runoff models for the catchment of interest without any subjectivity in model selection. Currently, MIKA-SHA learns models utilizing the model building components of FUSE and SUPERFLEX. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA’s model induction capabilities have been tested on the Red Creek catchment near Vestry, Mississippi, United States. The resulted model architectures through MIKA-SHA are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists.
Genetic programming (GP) is a widely used machine learning (ML) algorithm that has been applied in water resources science and engineering since its conception in the early 1990s. However, similar to other ML applications, the GP algorithm is often used as a data fitting tool rather than as a model building instrument. We find this a gross underutilization of the GP capabilities. The most unique and distinct feature of GP that makes it distinctly different from the rest of ML techniques is its capability to produce explicit mathematical relationships between input and output variables. In the context of theory-guided data science (TGDS) which recently emerged as a new paradigm in ML with the main goal of blending the existing body of knowledge with ML techniques to induce physically sound models. Hence, TGDS has evolved into a popular data science paradigm, especially in scientific disciplines including water resources. Following these ideas, in our prior work, we developed two hydrologically informed rainfall-runoff model induction toolkits for lumped modelling and distributed modelling based on GP. In the current work, the two toolkits are applied using a different hydrological model building library. Here, the model building blocks are derived from the Sugawara TANK model template which represents the elements of hydrological knowledge. Results are compared against the traditional GP approach and suggest that GP as a rainfall-runoff model induction toolkit preserves the prediction power of the traditional GP short-term forecasting approach while benefiting to better understand the catchment runoff dynamics through the readily interpretable induced models.
Relative dominance of the runoff controls, such as topography, geology, soil types, land use, and climate, may differ from catchment to catchment due to spatial and temporal heterogeneity of landscape properties and climate variables. Understanding dominant runoff controls is an essential task in developing unified hydrological theories at the catchment scale. Semi-distributed rainfall-runoff models are often used to identify dominant runoff controls for a catchment of interest. In most such applications, the model selection is based on either expert's judgement or experimental and fieldwork insights. Model selection is the most important step in any hydrological modelling exercise as the findings are largely influenced by the selected model. Hence, a subjective model selection without sufficient expert's knowledge or experimental insights may result in biased findings, especially for comparative studies like identification of dominant runoff controls. In this study, we use a physics informed machine learning toolbox based on genetic programming Machine Induction Knowledge Augmented - System Hydrologique Asiatique (MIKA-SHA) to identify the relative dominance of runoff controls. We find the quantitative and automated approach based on MIKA-SHA to be highly appropriate for the intended task. MIKA-SHA does not require explicit user selections and relies on data and fundamental hydrological processes. The approach is tested using the Rappahannock River basin at Remington, Virginia, United States. Two rainfall-runoff models are learnt to represent the runoff dynamics of the catchment using topography-based and soil-type-based hydrologic response units independently. Based on prediction capabilities, in this case, the topography is identified as the dominant runoff driver.
<p>Modelling of rainfall-runoff phenomenon continues to be a challenging task at hand of hydrologists as the underlying processes are highly nonlinear, dynamic and interdependent. Numerous modelling strategies like empirical, conceptual, physically based, data driven, are used to develop rainfall-runoff models as no model type can be considered to be universally pertinent for a wide range of problems. Latest literature review emphasizes that the crucial step of hydrological model selection is often subjective and is based on legacy. As the research outcome depends on model choice, there is a necessity to automate the process of model evolution, evaluation and selection based on research objectives, temporal and spatial characteristics of available data and catchment properties. Therefore, this study proposes a novel automated model building algorithm relying on machine learning technique Genetic Programming (GP).</p><p>State of art GP applications in rainfall-runoff modelling as yet used the algorithm as a short-term forecasting tool which produces an expected future time series very much alike to neural networks application. Such simplistic applications of data driven black-box machine learning techniques may lead to development of accurate yet meaningless models which do not satisfy basic hydrological insights and may have severe difficulties with interpretation. Concurrently, it should be admitted that there is a vast amount of knowledge and understanding of physical processes that should not just be thrown away. Thus, we strongly believe that the most suitable way forward is to couple the already existing body of knowledge with machine learning techniques in a guided manner to enhance the meaningfulness and interpretability of the induced models.</p><p>In this suggested algorithm the domain knowledge is introduced through the incorporation of process knowledge by adding model building blocks from prevailing rainfall-runoff modelling frameworks into the GP function set. Presently, the function set library consists with Sugawara TANK model functions, generic components of two flexible rainfall-runoff modelling frameworks (FUSE and SUPERFLEX) and model equations of 46 existing hydrological models (MARRMoT). Nevertheless, perhaps more importantly, the algorithm is readily integratable with any other internal coherence building blocks. This approach contrasts from rest of machine learning applications in rainfall-runoff modelling as it not only produces the runoff predictions but develops a physically meaningful hydrological model which helps the hydrologist to better understand the catchment dynamics. The proposed algorithm considers the model space and automatically identifies the appropriate model configurations for a catchment of interest by optimizing user-defined learning objectives in a multi-objective optimization framework. The model induction capabilities of the proposed algorithm have been evaluated on the Blackwater River basin, Alabama, United States. The model configurations evolved through the model-building algorithm are compatible with the fieldwork investigations and previously reported research findings.</p>
No abstract
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.