Interpretable models for extrapolation in scientific machine learning

Muckley, Eric S.; Saal, James E.; Meredig, Bryce; Roper, Christopher S.; Martin, John H.

doi:10.1039/d3dd00082f

Cited by 18 publications

(25 citation statements)

References 53 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Once hypothetically relevant features correlated to the output are selected, relatively simple models can be constructed to make extrapolations. Even simple linear models can be quite effective for this purpose . The features themselves need not have an interpretable relationship to the property being studied ( vide infra ); they merely serve as a proxy for guiding the experiment selection.…”

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

“…Model explainability in these early stages is unnecessary because the models will be based on limited data and thus prone to overfitting and oversimplification. Moreover, the most appropriate models for initial discoveryfor both interpretability and extrapolationmay be the types of feature-selected linear models discussed above, obviating the need for more sophisticated black-box model interpretability methods. In fact, empirical studies have found XAI detrimental in uncertain environments, as humans are more likely to reject helpful recommendations because of overconfidence in their troubleshooting abilities .…”

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

“…Common ML metrics (accuracy, R 2 , etc.) do not express the intended goal when in the presence of such outcome imbalances, nor do they measure an algorithm’s ability to guide iterative discovery . Solving this problem may simply correspond to choosing different loss functions when training ML models.…”

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

See 2 more Smart Citations

In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science

Schrier,

Norquist,

Buonassisi

et al. 2023

J. Am. Chem. Soc.

View full text Add to dashboard Cite

Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-T c oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.

show abstract

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

Section: Recommendations Toward ML For Exceptional Materialsmentioning

confidence: 99%

See 1 more Smart Citation

In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science

Schrier,

Norquist,

Buonassisi

et al. 2023

J. Am. Chem. Soc.

View full text Add to dashboard Cite

show abstract

“…24 In fact, simple linear models built with an appropriate combination of input features are often better at extrapolating to novel examples. 25 activities to introduce chemistry students to ML techniques, including the use of ML classifier models to distinguish functional groups in IR spectra, 26 modeling the response of metal nanoparticle colorimetric sensors using neural networks, 27 chemometric analysis of wines, 28 and unsupervised clustering of FTIR and mass-spectrometry data for whisky, tea, and fruit. 29 In addition to teaching practical skills, these activities also implicitly teach students to be aware of limitations and possible failures of ML, including issues with data quantity and quality (e.g., data set imbalances, domain shifts) and effects on prediction quality.…”

Section: ■ Introductionmentioning

confidence: 99%

“…Even simple regularized linear regression models suffice to predict chemical properties as diverse as molecular atomization energies, molecular orbital energies, and interatomic potentials, or to analyze photocurrent spectroscopy experiments . In fact, simple linear models built with an appropriate combination of input features are often better at extrapolating to novel examples . Recent articles in this Journal have described activities to introduce chemistry students to ML techniques, including the use of ML classifier models to distinguish functional groups in IR spectra, modeling the response of metal nanoparticle colorimetric sensors using neural networks, chemometric analysis of wines, and unsupervised clustering of FTIR and mass-spectrometry data for whisky, tea, and fruit .…”

Section: Introductionmentioning

confidence: 99%

Rediscovering the Particle-in-a-Box: Machine Learning Regression Analysis for Hypothesis Generation in Physical Chemistry Lab

Thrall,

Martinez Lopez,

Egg

et al. 2023

J. Chem. Educ.

View full text Add to dashboard Cite

Machine learning based post‐processing of model‐derived near‐surface air temperature – A multimodel approach

Stachura,

Ustrnul,

Sekuła

et al. 2023

Quart J Royal Meteoro Soc

View full text Add to dashboard Cite

In this article, a machine‐learning‐based tool for calibrating numerical forecasts of near‐surface air temperature is proposed. The study area covers Poland representing a temperate type of climate with transitional features and highly variable weather. The direct output of numerical weather prediction (NWP) models is often biased and needs to be adjusted to observed values. Forecasters have to reconcile forecasts from several NWP models during their operational work. As the proposed method is based on deterministic forecasts from three short‐range limited‐area models (ALARO, AROME and COSMO), it can support them in their decision‐making process. Predictors include forecasts of weather elements produced by the NWP models at synoptic weather stations across Poland and station‐embedded data on ambient orography. The Random Forests algorithm (RF) has been used to produce bias‐corrected forecasts on a test set spanning one year. Its performance was evaluated against the NWP models, a linear combination of all predictors (multiple linear regression, MLR) as well as a basic Artificial Neural Network (ANN). Detailed evaluation was done to identify potential strengths and weaknesses of the model at the temporal and spatial scale. The value of RMSE of a forecast obtained by the RF model was 11% and 27% lower compared to the MLR model and the best‐performing NWP model respectively. The ANN model turned out to be even superior, outperforming RF by around 2.5%. The greatest improvement occurred for warm bias during the nighttime from July to September. The largest difference in forecast accuracy between RF and ANN appeared for temperature drops inApril nights. Poor performance of RF for extreme temperature ranges may be suppressed by training the model on forecast error instead of observed values of the variable.

show abstract

Interpretable models for extrapolation in scientific machine learning

Abstract: Data-driven models are central to scientific discovery. In efforts to achieve state-of-the-art model accuracy, researchers are employing increasingly complex machine learning algorithms that often outperform simple regressions in interpolative settings...

Cited by 18 publications

References 53 publications

In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science

In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science

Rediscovering the Particle-in-a-Box: Machine Learning Regression Analysis for Hypothesis Generation in Physical Chemistry Lab

Machine learning based post‐processing of model‐derived near‐surface air temperature – A multimodel approach

Contact Info

Product

Resources

About