This Editorial is intended for materials scientists interested in performing machine learning-centered research. We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed. Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise. File list (2) download file view on ChemRxiv BestPractices_submitted.pdf (2.22 MiB) download file view on ChemRxiv BestPractices paper-SI.pdf (3.00 MiB)
In this paper, we demonstrate an application of the Transformer self-attention mechanism in the context of materials science. Our network, the Compositionally Restricted Attention-Based network (), explores the area of structure-agnostic materials property predictions when only a chemical formula is provided. Our results show that ’s performance matches or exceeds current best-practice methods on nearly all of 28 total benchmark datasets. We also demonstrate how ’s architecture lends itself towards model interpretability by showing different visualization approaches that are made possible by its design. We feel confident that and its attention-based framework will be of keen interest to future materials informatics researchers.
New methods for describing materials as vectors in order to predict their properties using machine learning are common in the field of material informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple one-hot encoding of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or data that is not fully representative we show that domain knowledge offers advantages in predictive ability.
<div>In this paper, we evaluate an attention-based neural network architecture for the prediction of inorganic materials properties given access to nothing but each materials' chemical composition. We demonstrate that this novel application of self-attention for material property predictions strikingly outperforms both statistical and ensemble machine learning methods, as well as a fully-connected neural network.This Compositionally-Restricted Attention-Based network, referred to as CrabNet, is associated with improved test metrics across six of seven different tested materials properties from the AFLOW database. Moreover, we show that CrabNet outperforms other methods in the absence of chemical information, even when the statistical and ensemble learning techniques are given domain-specific chemical knowledge about the materials. Given its impressive improvement in predictive accuracy compared to previous methods, as well as its minimal hardware requirements for training and prediction, we feel confident that CrabNet, and the ideas explored within, will be central for future materials informatics research.</div>
<div>In this paper, we evaluate an attention-based neural network architecture for the prediction of inorganic materials properties given access to nothing but each materials' chemical composition. We demonstrate that this novel application of self-attention for material property predictions strikingly outperforms both statistical and ensemble machine learning methods, as well as a fully-connected neural network.This Compositionally-Restricted Attention-Based network, referred to as CrabNet, is associated with improved test metrics across six of seven different tested materials properties from the AFLOW database. Moreover, we show that CrabNet outperforms other methods in the absence of chemical information, even when the statistical and ensemble learning techniques are given domain-specific chemical knowledge about the materials. Given its impressive improvement in predictive accuracy compared to previous methods, as well as its minimal hardware requirements for training and prediction, we feel confident that CrabNet, and the ideas explored within, will be central for future materials informatics research.</div>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.