2021
DOI: 10.1088/1361-648x/ac1280
|View full text |Cite
|
Sign up to set email alerts
|

Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet

Abstract: As the number of novel data-driven approaches to material science continues to grow, it is crucial to perform consistent quality, reliability and applicability assessments of model performance. In this paper, we benchmark the Materials Optimal Descriptor Network (MODNet) method and architecture against the recently released MatBench v0.1, a curated test suite of materials datasets. MODNet is shown to outperform current leaders on 4 of the 13 tasks, whilst closely matching the current leaders on a further 3 tas… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(20 citation statements)
references
References 46 publications
0
15
0
Order By: Relevance
“…It is known that non‐graph models such as Automatminer [ 54 ] and MODNet [ 62 , 63 ] that perform extensive hyperparameter tuning and feature selection steps at the inner loop of NCV typically outperform GNNs on small datasets. Once the internal optimization is complete, such models are fit on the entire fold so that no validation data is left out.…”
Section: Resultsmentioning
confidence: 99%
“…It is known that non‐graph models such as Automatminer [ 54 ] and MODNet [ 62 , 63 ] that perform extensive hyperparameter tuning and feature selection steps at the inner loop of NCV typically outperform GNNs on small datasets. Once the internal optimization is complete, such models are fit on the entire fold so that no validation data is left out.…”
Section: Resultsmentioning
confidence: 99%
“…The GA keeps randomness, while giving more importance to local optima. Therefore, a satisfactory set of hyperparameters is found more quickly and at a reduced computational cost compared to the standard gridor random-search previously used [31] . As is shown below, this approach results in a relative improvement of up to 12% on the Matbench tasks, compared to the previously used grid-search.…”
Section: Modnetmentioning
confidence: 99%
“…This is especially the case for the smaller experimental datasets. The new approach brings a significant increase in performance compared to the standard grid-or random-search that was adopted previously [31] . This shows the importance of hyperparameters for accurate generalization.…”
Section: Modnetmentioning
confidence: 99%
“…Additionally, they provide the tools to construct possibly thousands of features from calculations based on a material's composition, structure and electronic properties from DFT calculations, and have frameworks for visualization and automatic machine learning. To apply Matminer's featurization tools, we extend an existing implementation by Breuck et al [94], which was used to generate a supervised machine learning framework called the MODnet. The implementation by Breuck et al provides featurization for a material's composition, structure and atomic sites.…”
Section: Materials Informaticsmentioning
confidence: 99%