Abstract:Abstract. Extreme events such as heat waves, cold spells, floods, droughts, tropical cyclones, and tornadoes have potentially devastating impacts on natural and engineered systems and human communities worldwide. Stakeholder decisions about critical infrastructures, natural resources, emergency preparedness and humanitarian aid typically need to be made at local to regional scales over seasonal to decadal planning horizons. However, credible climate change attribution and reliable projections at more localized… Show more
“…For example, the use of physical principles to constrain spatiotemporal pattern mining algorithms has been explored in [81], [82] for finding ocean eddies from satellite data. The need to explore TGDS models for uncertainty quantification is discussed in [33] in the context of understanding and projecting climate extremes. Scientific knowledge can also be used to advance other aspects of data science, e.g., the design of scientific work-flows [83], [84] or the generation of model simulations [85].…”
Section: Resultsmentioning
confidence: 99%
“…Some examples include the discovery of novel climate patterns and relationships [18], [19], closure of knowledge gaps in turbulence modeling efforts [20], [21], discovery of novel compounds in material science [22], [23], [24], design of density functionals in quantum chemistry [25], improved imaging technologies in bio-medical science [26], [27], discovery of genetic biomarkers [28], and the estimation of surface water dynamics at a global scale [29], [30]. These efforts have been complemented with recent review papers [8], [31], [32], [33], workshops (e.g., a 2016 conference on physics informed machine learning [34]) and industry initiatives (e.g., a recent IBM Research initiative on "physical analytics" [35]).…”
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
“…For example, the use of physical principles to constrain spatiotemporal pattern mining algorithms has been explored in [81], [82] for finding ocean eddies from satellite data. The need to explore TGDS models for uncertainty quantification is discussed in [33] in the context of understanding and projecting climate extremes. Scientific knowledge can also be used to advance other aspects of data science, e.g., the design of scientific work-flows [83], [84] or the generation of model simulations [85].…”
Section: Resultsmentioning
confidence: 99%
“…Some examples include the discovery of novel climate patterns and relationships [18], [19], closure of knowledge gaps in turbulence modeling efforts [20], [21], discovery of novel compounds in material science [22], [23], [24], design of density functionals in quantum chemistry [25], improved imaging technologies in bio-medical science [26], [27], discovery of genetic biomarkers [28], and the estimation of surface water dynamics at a global scale [29], [30]. These efforts have been complemented with recent review papers [8], [31], [32], [33], workshops (e.g., a 2016 conference on physics informed machine learning [34]) and industry initiatives (e.g., a recent IBM Research initiative on "physical analytics" [35]).…”
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
“…Domain expertise will be needed to frame questions, identify inputs, construct suitable model architectures, and interpret results. There is also evidence that domain expertise of physical principles can improve machine learning outcomes (Ganguly et al, 2014;Karpatne et al, 2017).…”
Deep learning (DL), a new generation of artificial neural network research, has transformed industries, daily lives, and various scientific disciplines in recent years. DL represents significant progress in the ability of neural networks to automatically engineer problem‐relevant features and capture highly complex data distributions. I argue that DL can help address several major new and old challenges facing research in water sciences such as interdisciplinarity, data discoverability, hydrologic scaling, equifinality, and needs for parameter regionalization. This review paper is intended to provide water resources scientists and hydrologists in particular with a simple technical overview, transdisciplinary progress update, and a source of inspiration about the relevance of DL to water. The review reveals that various physical and geoscientific disciplines have utilized DL to address data challenges, improve efficiency, and gain scientific insights. DL is especially suited for information extraction from image‐like data and sequential data. Techniques and experiences presented in other disciplines are of high relevance to water research. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed AI neuroscience, where scientists interpret the decision process of deep networks and derive insights, has been born. This budding subdiscipline has demonstrated methods including correlation‐based analysis, inversion of network‐extracted features, reduced‐order approximations by interpretable models, and attribution of network decisions to inputs. Moreover, DL can also use data to condition neurons that mimic problem‐specific fundamental organizing units, thus revealing emergent behaviors of these units. Vast opportunities exist for DL to propel advances in water sciences.
“…Several studies have been performed on extreme precipitation and temperature under climate change (Solomon et al 2007; Kao and Ganguly 2011;Coumou and Rahmstorf 2012;Field et al 2012;Stocker et al 2013;Ganguly et al 2014;Kodra and Ganguly 2014). However, the impact of climate change on wind extremes has not received similar attention even though they have effects on energy sectors Barthelmie 2010, 2013), design and safety of buildings and bridges (ASCE 7-05: Minimum Design Loads for Building and Structures), insurance industry (Born and Viscusi 2006), and coastal ecosystems (Iles et al 2012).…”
are statistically not significant over most regions. The MME model simulates the spatial patterns of extreme winds for 25-100 year return periods. The projected extreme winds from GCMs exhibit statistically less significant trends compared to the historical reference period.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.