The lack of reliable methods for identifying descriptors -the sets of parameters capturing the underlying mechanisms of a materials property -is one of the key factors hindering efficient materials development. Here, we propose a systematic approach for discovering descriptors for materials properties, within the framework of compressed-sensing based dimensionality reduction. SISSO (sure independence screening and sparsifying operator) tackles immense and correlated features spaces, and converges to the optimal solution from a combination of features relevant to the materials' property of interest. In addition, SISSO gives stable results also with small training sets. The methodology is benchmarked with the quantitative prediction of the ground-state enthalpies of octet binary materials (using ab initio data) and applied to the showcase example of predicting the metal/insulator classification of binaries (with experimental data). Accurate, predictive models are found in both cases. For the metal-insulator classification model, the predictive capability are tested beyond the training data: It rediscovers the available pressure-induced insulator→metal transitions and it allows for the prediction of yet unknown transition candidates, ripe for experimental validation. As a step forward with respect to previous model-identification methods, SISSO can become an effective tool for automatic materials development.
The availability of big data in materials science offers new routes for analyzing materials properties and functions and achieving scientific understanding. Finding structure in these data that is not directly visible by standard tools and exploitation of the scientific information requires new and dedicated methodology based on approaches from statistical learning, compressed sensing, and other recent methods from applied mathematics, computer science, statistics, signal processing, and information science. In this paper, we explain and demonstrate a compressed-sensing based methodology for feature selection, specifically for discovering physical descriptors, i.e., physical parameters that describe the material and its properties of interest, and associated equations that explicitly and quantitatively describe those relevant properties. As showcase application and proof of concept, we describe how to build a physical model for the quantitative prediction of the crystal structure of binary compound semiconductors.
The identification of descriptors of materials properties and functions that capture the underlying physical mechanisms is a critical goal in data-driven materials science. Only such descriptors will enable a trustful and efficient scanning of materials spaces and possibly the discovery of new materials. Recently, the sure-independence screening and sparsifying operator (SISSO) has been introduced and was successfully applied to a number of materials-science problems. SISSO is a compressed sensing based methodology yielding predictive models that are expressed in form of analytical formulas, built from simple physical properties. These formulas are systematically selected from an immense number (billions or more) of candidates. In this work, we describe a powerful extension of the methodology to a 'multi-task learning' approach, which identifies a single descriptor capturing multiple target materials properties at the same time. This approach is specifically suited for a heterogeneous materials database with scarce or partial data, e.g. in which not all properties are reported for all materials in the training set. As showcase examples, we address the construction of materials properties maps for the relative stability of octet-binary compounds, considering several crystal phases simultaneously, and the metal/insulator classification of binary materials distributed over many crystal prototypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.