Abstract:A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data a… Show more
“…Similar approaches have been mirrored in science and engineering in recent years. These efforts are now being formalized through FAIR (findable, accessible, interoperable and reusable) initiatives [55,56] in the context of scientific datasets [57], research software [58] and AI models [8,59]. This study represents yet another significant step in this direction.…”
Section: Discussionmentioning
confidence: 99%
“…With the explosion of AI models [1][2][3][4][5] developed to predict various material properties over the recent years, it has become difficult to keep track of the available AI models and the datasets that are used for training and inference. Numerous efforts [6,7] have been made toward the integration of AI models and their associated datasets in one place to streamline their use for a wide range of applications and a broad community of users [8][9][10]. AI models and datasets are often available through open repositories, in the best scenario, so a user can download, deploy and reproduce their putative capabilities.…”
We introduce an end-to-end computational framework 
that allows for hyperparameter optimization 
using the DeepHyper library, 
accelerated model training, and interpretable 
AI inference. The framework is based on state-of-the-art AI models
including CGCNN, 
PhysNet, SchNet, MPNN, 
MPNN-transformer, and TorchMD-NET. 
We employ these AI models along with the benchmark QM9, hMOF, 
and MD17 datasets to showcase how the models can predict user-specified material properties within modern 
computing environments. We demonstrate 
transferable applications in the modeling 
of small molecules, inorganic crystals and nanoporous metal organic 
frameworks with a unified, standalone framework. 
We have deployed and tested this framework
in the ThetaGPU supercomputer 
at the Argonne Leadership Computing Facility, 
and in the Delta supercomputer at the National Center for 
Supercomputing Applications to 
provide researchers with modern tools to 
conduct accelerated AI-driven 
discovery in leadership-class computing environments. 
We release these digital assets as 
open source scientific software in GitLab, and 
ready-to-use Jupyter notebooks in Google Colab.
“…Similar approaches have been mirrored in science and engineering in recent years. These efforts are now being formalized through FAIR (findable, accessible, interoperable and reusable) initiatives [55,56] in the context of scientific datasets [57], research software [58] and AI models [8,59]. This study represents yet another significant step in this direction.…”
Section: Discussionmentioning
confidence: 99%
“…With the explosion of AI models [1][2][3][4][5] developed to predict various material properties over the recent years, it has become difficult to keep track of the available AI models and the datasets that are used for training and inference. Numerous efforts [6,7] have been made toward the integration of AI models and their associated datasets in one place to streamline their use for a wide range of applications and a broad community of users [8][9][10]. AI models and datasets are often available through open repositories, in the best scenario, so a user can download, deploy and reproduce their putative capabilities.…”
We introduce an end-to-end computational framework 
that allows for hyperparameter optimization 
using the DeepHyper library, 
accelerated model training, and interpretable 
AI inference. The framework is based on state-of-the-art AI models
including CGCNN, 
PhysNet, SchNet, MPNN, 
MPNN-transformer, and TorchMD-NET. 
We employ these AI models along with the benchmark QM9, hMOF, 
and MD17 datasets to showcase how the models can predict user-specified material properties within modern 
computing environments. We demonstrate 
transferable applications in the modeling 
of small molecules, inorganic crystals and nanoporous metal organic 
frameworks with a unified, standalone framework. 
We have deployed and tested this framework
in the ThetaGPU supercomputer 
at the Argonne Leadership Computing Facility, 
and in the Delta supercomputer at the National Center for 
Supercomputing Applications to 
provide researchers with modern tools to 
conduct accelerated AI-driven 
discovery in leadership-class computing environments. 
We release these digital assets as 
open source scientific software in GitLab, and 
ready-to-use Jupyter notebooks in Google Colab.
“…Beyond these original contributions, we also provide an end-to-end framework that unifies initial data production, construction of BCs and their use to train, validate and test the performance and reliability of AI surrogates. These activities aim to create FAIR findable, accessible, interoperable and reusable and AI-ready datasets and AI models [23][24][25][26].…”
We present a critical analysis of physics-informed neural operators to solve partial differential equations that are ubiquitous in the study and modeling of physics phenomena using carefully curated datasets. Further, we provide a benchmarking suite which can be used to evaluate physics-informed neural operators in solving such problems. We first demonstrate that our methods reproduce the accuracy and performance of other neural operators published elsewhere in the literature to learn the 1D wave equation and the 1D Burgers equation. Thereafter, we apply our physics-informed neural operators to learn new types of equations, including the 2D Burgers equation in the scalar, inviscid and vector types. Finally, we show that our approach is also applicable to learn the physics of the 2D linear and nonlinear shallow water equations, which involve three coupled partial differential equations. We release our artificial intelligence surrogates and scientific software to produce initial data and boundary conditions to study a broad range of physically motivated scenarios. We provide the \href{https://github.com/shawnrosofsky/PINO_Applications/tree/main}{source code}, an interactive \href{https://shawnrosofsky.github.io/PINO_Applications/}{website} to visualize the predictions of our physics informed neural operators, and a tutorial for their use at the \href{https://www.dlhub.org}{Data and Learning Hub for Science}.
“…Specifically, the FAIR principles were originally introduced [7] as guidelines for the management and stewardship of scientific datasets to optimize their reuse. Recently, the FAIR for Research Software (FAIR4RS) working group has developed an interpretation of the FAIR principles specifically for research software [8][9][10][11], and FAIR principles have also been applied in the context of benchmarking and tool development [12], and on the creation of computational frameworks for AI models [13].…”
The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning models—algorithms that have been trained on data without being explicitly programmed—and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template’s use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.