2020
DOI: 10.1021/acs.chemmater.0c01907
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices

Abstract: This Editorial is intended for materials scientists interested in performing machine learning-centered research. We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the conce… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
226
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 296 publications
(252 citation statements)
references
References 109 publications
1
226
0
1
Order By: Relevance
“…Within the context of ML, it is common practice not to fit or train the regression on the entire dataset but instead to split this dataset into a training and test set. 27,48 The training set is used to fit the regression, and the quality of the resulting model (i.e., generalization error) is then assessed by quantifying the performance of this model instance, using the mean-absolute-error (MAE) or root-mean-squared-error (RMSE), on the test set. Furthermore, in more complex (regression) models, one is required to have an additional "test set"-the validation set-at hand to fit the hyper parameters of the model.…”
Section: Limitation Of Small Datasets For Machine Learning Based Regressionmentioning
confidence: 99%
See 1 more Smart Citation
“…Within the context of ML, it is common practice not to fit or train the regression on the entire dataset but instead to split this dataset into a training and test set. 27,48 The training set is used to fit the regression, and the quality of the resulting model (i.e., generalization error) is then assessed by quantifying the performance of this model instance, using the mean-absolute-error (MAE) or root-mean-squared-error (RMSE), on the test set. Furthermore, in more complex (regression) models, one is required to have an additional "test set"-the validation set-at hand to fit the hyper parameters of the model.…”
Section: Limitation Of Small Datasets For Machine Learning Based Regressionmentioning
confidence: 99%
“…3,9,[19][20][21][22][23][24][25] In general, these achievements are rooted in the access to suitable large datasets, both theoretically and experimentally. [14][15][16][26][27][28] However, even though such big datasets (and access to them) are becoming common place, 3,27,[29][30][31][32] they do not represent the datasets most materials researchers work with on a day-to-day basis. Within general experimental material research projects, researchers generally produce no more than a hand full of data points (c.q.…”
Section: Introductionmentioning
confidence: 99%
“…Materials informatics (MI) is the resulting field of research which utilizes statistical and machine learning (ML) approaches in combination with high-throughput computation to analyze the wealth of existing materials information and gain unique insights. [2][3][4] As this wealth has increased, practitioners of MI have increasingly turned to deep learning techniques to model and represent inorganic chemistry, resulting in approaches such as ElemNet, IRNet, CGCNN, SchNet and Roost. [5][6][7][8][9] In specific cases, 7,8,[10][11][12][13][14][15] including CGCNN and SchNet, the compounds are represented using their chemical and structural information.…”
Section: Introductionmentioning
confidence: 99%
“…Artificial intelligence is, today, used in various fields of big data analysis, such as image analysis [ 23 , 24 ]. Even in chemistry fields, which to date, have primarily been experimental, studies on material informatics have begun [ 25 , 26 , 27 ]. Some artificial intelligence applications in chemistry fields are utilized in material design [ 28 , 29 ] and in analyzing various materials [ 30 ].…”
Section: Introductionmentioning
confidence: 99%