Logan Ward scite author profile

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learningbased models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method for partitioning the data set into groups of similar materials in order to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.

show abstract

Matminer: An open source toolkit for materials data mining

Ward

Dunn

Faghaninia

et al. 2018

Computational Materials Science

571

476

View full text Add to dashboard Cite

As materials data sets grow in size and scope, the role of data mining and statistical learning methods to analyze these materials data sets and build predictive models is becoming more important. This manuscript introduces matminer, an open-source, Python-based software platform to facilitate datadriven methods of analyzing and predicting materials properties. Matminer provides modules for retrieving large data sets from external databases such as the Materials Project, Citrination, Materials Data Facility, and Materials Platform for Data Science. It also provides implementations for an extensive library of feature extraction routines developed by the materials community, with 44 featurization classes that can generate thousands of individual descriptors and combine them into mathematical functions. Finally, matminer provides a visualization module for producing interactive, shareable plots. These functions are designed in a way that integrates closely with machine learning and data analysis packages already developed and in use by the Python data science community. We explain the structure and logic of matminer, provide a description of its various modules, and showcase several examples of how matminer can be used to collect data, reproduce data mining studies reported in the literature, and test new methodologies.

show abstract

Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations

et al. 2017

View full text Add to dashboard Cite

While high-throughput Density Functional Theory (DFT) has become a prevalent tool for materials discovery, it is limited by the relatively large computational cost. In this paper, we explore using DFT data from high-throughput calculations to create faster, surrogate models with machine learning (ML) that can be used to guide new searches. Our method works by using decision tree models to map DFT-calculated formation enthalpies to a set of attributes consisting of two distinct types: (i) composition-dependent attributes of elemental properties (as have been used in previous ML models of DFT formation energies), combined with (ii) attributes derived from the Voronoi tessellation of the compound's crystal structure. ML models created using this method have half the cross-validation error and similar training and evaluation speeds to models created with the Coulomb matrix and Pair Radial Distribution Function (PRDF) methods. For a dataset of 435,000 formation energies taken from the Open Quantum Materials Database (OQMD), our model achieves a mean absolute error (MAE) of 80 meV/atom in cross-validation, which is lower than the approximate error between DFTcomputed and experimentally-measured formation enthalpies and below 15% of the mean absolute deviation of the training set. We also demonstrate our method can accurately estimate the formation energy of materials outside of the training set and be used to identify materials with especially-large formation enthalpies. We propose that our models can be used to accelerate the discovery of new materials by identifying the most promising materials to study with DFT at little additional computational cost.

show abstract

ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition

Jha

Ward

Paul

et al. 2018

Sci Rep

314

304

View full text Add to dashboard Cite

Conventional machine learning approaches for predicting material properties from elemental compositions have emphasized the importance of leveraging domain knowledge when designing model inputs. Here, we demonstrate that by using a deep learning approach, we can bypass such manual feature engineering requiring domain knowledge and achieve much better results, even with only a few thousand training samples. We present the design and implementation of a deep neural network model referred to as ElemNet; it automatically captures the physical and chemical interactions and similarities between different elements using artificial intelligence which allows it to predict the materials properties with better accuracy and speed. The speed and best-in-class accuracy of ElemNet enable us to perform a fast and robust screening for new material candidates in a huge combinatorial space; where we predict hundreds of thousands of chemical systems that could contain yet-undiscovered compounds.

show abstract

Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments

et al. 2018

View full text Add to dashboard Cite

show abstract

Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

Meredig

Antono

Church

et al. 2018

Mol. Syst. Des. Eng.

193

180

View full text Add to dashboard Cite

Traditional machine learning (ML) metrics overestimate model performance for materials discovery. We introduce (1) leave-onecluster-out cross-validation (LOCO CV) and (2) a simple nearestneighbor benchmark to show that model performance in discovery applications strongly depends on the problem, data sampling, and extrapolation. Our results suggest that ML-guided iterative experimentation may outperform standard high-throughput screening for discovering breakthrough materials like high-T c superconductors with ML.Materials informatics (MI), or the application of data-driven algorithms to materials problems, has grown quickly as a field in recent years. 9 Across all of these applications, a training database of simulated or experimentally-measured materials properties serves as input to a ML algorithm that predictively maps features (i.e., materials descriptors) to target materials properties. Ideally, the result of training such models would be the experimental realization of new materials with promising properties. The MI community has produced several such success stories, including thermoelectric compounds, 10,11 shapememory alloys, 12 superalloys, 13 and 3d-printable high-strength aluminum alloys. 14 However, in many cases, a model is itself the output of a study, and the question becomes: to what extent could the model be used to drive materials discovery? Typically, the performance of ML models of materials properties is quantified via cross-validation (CV). CV can be performed either in a single division of the available data into a training set (to build the model) and a test set (to evaluate its performance), or as an ensemble process known as k-fold CV wherein the data are partitioned into k nonoverlapping subsets of nearly equal size (folds) and model performance is averaged across each combination of k-1 training folds and one test fold. Leave-one-out crossvalidation (LOOCV) is the limit where k is the number of total examples in the dataset. Table 1 summarizes some examples of model performance statistics as reported in the aforementioned studies (some studies involved testing multiple algorithms across multiple properties).In Table 1, the reported model performance is uniformly excellent across all studies. A tempting conclusion is that any of these models could be used for one-shot high-throughput screening of large numbers of materials for desired properties. However, as we discuss below, traditional CV has critical shortcomings in terms of quantifying ML model performance for materials discovery. Issues with traditional crossvalidation for materials discoveryMany ML benchmark problems consist of data classification into discrete bins, i.e., pattern matching. For example, the Design, System, ApplicationMachine learning (ML) has become a widely-adopted predictive tool for materials design and discovery. Random k-fold cross-validation (CV), the traditional gold-standard approach for evaluating the quality of ML models, is fundamentally mismatched to the nature of materials discovery, and leads to ...

show abstract

Atomistic calculations and materials informatics: A review

Ward

Wolverton

2017

Current Opinion in Solid State and Materials Science

190

124

View full text Add to dashboard Cite

A machine learning approach for engineering bulk metallic glass alloys

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Logan Ward

A general-purpose machine learning framework for predicting properties of inorganic materials

Matminer: An open source toolkit for materials data mining

Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations

ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition

Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments

Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery

Atomistic calculations and materials informatics: A review

A machine learning approach for engineering bulk metallic glass alloys

Contact Info

Product

Resources

About