Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Murdock, Ryan; Kauwe, Steven K.; Wang, Anthony; Sparks, Taylor D.

doi:10.26434/chemrxiv.11879193

Cited by 6 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Band gap, formation energy, shear modulus, bulk modulus, Debye temperature, thermal expansion, and thermal conductivity data were then collected from the ICSD catalogue of the AFLOW database [16]. Duplicate entries were removed and each material property's formulae and ground-truth values were randomly partitioned into training, validation, and test sets (the full code is available in the GitHub repository [22]. Note, for this work, the associated Crystal Information Files (CIF) were discarded.…”

Section: Data Acquisitionmentioning

confidence: 99%

See 1 more Smart Citation

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Murdock

Kauwe

Wang

et al. 2020

Integr Mater Manuf Innov

View full text Add to dashboard Cite

New methods for describing materials as vectors in order to predict their properties using machine learning are common in the field of material informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple one-hot encoding of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or data that is not fully representative we show that domain knowledge offers advantages in predictive ability.

show abstract

Section: Data Acquisitionmentioning

confidence: 99%

“…The model was then tested on these withheld formulae. The code for these methods is also available on GitHub [22].…”

Section: Model Trainingmentioning

confidence: 99%

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Murdock

Kauwe

Wang

et al. 2020

Integr Mater Manuf Innov

View full text Add to dashboard Cite

show abstract

“…The preprocessing step of featurizing data is crucial for successful implementation of machine learning algorithms. Improper featurization of data can impact prediction and classification errors [30].…”

Section: The Featurization and Curation Of Am Datamentioning

confidence: 99%

Invited review: Machine learning for materials developments in metals additive manufacturing

Johnson

Vulimiri

et al. 2020

Additive Manufacturing

View full text Add to dashboard Cite

“…In general, a good ML project should do one or more of the following: screen or downselect candidate materials from a pool of known compounds for a given application or property, [1][2][3] acquire and process data to gain new insights, 4,5 conceptualize new modeling ap-proaches, [6][7][8][9][10] or explore ML in materials-specific applications. 1,[11][12][13] Consider these points when you judge the applicability of ML for your project.…”

Section: Meaningful Machine Learningmentioning

confidence: 99%

“…For sufficiently large datasets and for more "capable" learning architectures like very deep, fully-connected networks 7,122 or novel attention-based architectures such as CrabNet, 6 feature engineering and the integration of domain knowledge (such as through the use of CBFVs) in the input data becomes irrelevant and does not contribute to a better model performance compared to a simple onehot-encoding. 11 Therefore, due to the effort required to curate and evaluate domain knowledge-informed features specific to your research, you may find it more beneficial to seek out additional sources of data, already-established featurization schemes, or use learning methods that don't require domain-derived features 6 instead.…”

Section: Choosing Appropriate Models and Features*mentioning

confidence: 99%

Machine Learning for Materials Scientists: An Introductory Guide Towards Best Practices

Wang

Murdock²,

Kauwe³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

<div>This Editorial is intended for materials scientists interested in performing machine learning-centered research.</div><div><br></div><div>We cover broad guidelines and best practices regarding the obtaining and treatment of data, feature engineering, model training, validation, evaluation and comparison, popular repositories for materials data and benchmarking datasets, model and architecture sharing, and finally publication.</div><div>In addition, we include interactive Jupyter notebooks with example Python code to demonstrate some of the concepts, workflows, and best practices discussed.</div><div><br></div><div>Overall, the data-driven methods and machine learning workflows and considerations are presented in a simple way, allowing interested readers to more intelligently guide their machine learning research using the suggested references, best practices, and their own materials domain expertise.</div>

show abstract

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Cited by 6 publications

References 0 publications

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Is Domain Knowledge Necessary for Machine Learning Materials Properties?

Invited review: Machine learning for materials developments in metals additive manufacturing

Machine Learning for Materials Scientists: An Introductory Guide Towards Best Practices

Contact Info

Product

Resources

About