Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods-due to the cost, time or effort involved-but for which reliable data either already exists or can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints may be of many types and scales, as dictated by the application domain and needs. Predictions may also be extrapolative-extending into new materials spaces-provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven "materials informatics" strategies undertaken in the last decade, and identifies some challenges the community is facing and those that should be overcome in the near future.
The recent successes of the Materials Genome Initiative have opened up new opportunities for data-centric informatics approaches in several subfields of materials research, including in polymer science and engineering. Polymers, being inexpensive and possessing a broad range of tunable properties, are widespread in many technological applications. The vast chemical and morphological complexity of polymers though gives rise to challenges in the rational discovery of new materials for specific applications. The nascent field of polymer informatics seeks to provide tools and pathways for accelerated property prediction (and materials design) via surrogate machine learning models built on reliable past data. We have carefully accumulated a data set of organic polymers whose properties were obtained either computationally (bandgap, dielectric constant, refractive index, and atomization energy) or experimentally (glass transition temperature, solubility parameter, and density). A fingerprinting scheme that captures atomistic to morphological structural features was developed to numerically represent the polymers. Machine learning models were then trained by mapping the fingerprints (or features) to properties. Once developed, these models can rapidly predict properties of new polymers (within the same chemical class as the parent data set) and can also provide uncertainties underlying the predictions. Since different properties depend on different length-scale features, the prediction models were built on an optimized set of features for each individual property. Furthermore, these models are incorporated in a user-friendly online platform named Polymer Genome (). Systematic and progressive expansion of both chemical and property spaces are planned to extend the applicability of Polymer Genome to a wide range of technological domains.
Simulations based on solving the Kohn-Sham (KS) equation of density functional theory (DFT) have become a vital component of modern materials and chemical sciences research and development portfolios. Despite its versatility, routine DFT calculations are usually limited to a few hundred atoms due to the computational bottleneck posed by the KS equation. Here we introduce a machine-learning-based scheme to efficiently assimilate the function of the KS equation, and by-pass it to directly, rapidly, and accurately predict the electronic structure of a material or a molecule, given just its atomic configuration. A new rotationally invariant representation is utilized to map the atomic environment around a grid-point to the electron density and local density of states at that grid-point. This mapping is learned using a neural network trained on previously generated reference DFT results at millions of grid-points. The proposed paradigm allows for the high-fidelity emulation of KS DFT, but orders of magnitude faster than the direct solution. Moreover, the machine learning prediction scheme is strictly linear-scaling with system size.
Understanding the behavior (and failure) of dielectric insulators experiencing extreme electric fields is critical to the operation of present and emerging electrical and electronic devices. Despite its importance, the development of a predictive theory of dielectric breakdown has remained a challenge, owing to the complex multiscale nature of this process. Here, we focus on the intrinsic dielectric breakdown field of insulatorsthe theoretical limit of breakdown determined purely by the chemistry of the material, i.e., the elements the material is composed of, the atomic-level structure, and the bonding. Starting from a benchmark data set (generated from laborious first-principles computations) of the intrinsic dielectric breakdown field of a variety of model insulators, simple predictive phenomenological models of dielectric breakdown are distilled using advanced statistical or machine learning schemes, revealing key correlations and analytical relationships between the breakdown field and easily accessible material properties. The models are shown to be general, and can hence guide the screening and systematic identification of high electric field tolerant materials.
Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.
Solubility parameter models are widely used to select suitable solvents/nonsolvents for polymers in a variety of processing and engineering applications. In this study, we focus on two well-established models, namely, the Hildebrand and Hansen solubility parameter models. Both models are built on the basis of the notion of "like dissolves like" and identify a liquid as a good solvent for a polymer if the solubility parameters of the liquid and the polymer are close to each other. Here we make a critical and quantitative assessment of the accuracy/utility of these two models by comparing their predictions against actual experimental data. Using a data set of 75 polymers, we find that the Hildebrand model displays a predictive accuracy of 60% for solvents and 76% for nonsolvents. The Hansen model leads to a similar performance; on the basis of a data set of 25 polymers for which Hansen parameters are available, we find that it has an accuracy of 67% for solvents and 76% for nonsolvents. The availability of the Hildebrand parameters for a large polymer data set makes it a widely applicable capability, as the Hildebrand parameter for a new polymer may be determined using this data set and machine learning methods as we have done before; the predicted Hildebrand parameter for a new polymer may then be used to determine suitable solvents and nonsolvents. Such predictions are difficult to make with the Hansen model, as the data set of Hansen parameters for polymers is rather small. Nevertheless, the Hildebrand approach must be used with caution. Our analysis shows that while the Hildebrand model has a predictive accuracy of 70−75% for nonpolar polymers, it performs rather poorly for polar polymers (with an accuracy of 57%). Going forward, determination of solvents and nonsolvents for polymers may benefit by developing classification models built directly on the basis of available experimental data sets rather than utilizing the solubility parameter approach, which is limited in versatility and accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.