2022
DOI: 10.1039/d2dd00039c
|View full text |Cite
|
Sign up to set email alerts
|

Random projections and kernelised leave one cluster out cross validation: universal baselines and evaluation tools for supervised machine learning of material properties

Abstract: With machine learning being a popular topic in current computational materials science literature, creating representations for compounds has become common place. These representations are rarely compared, as evaluating their performance...

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 47 publications
(135 reference statements)
0
3
0
Order By: Relevance
“…We used four-fold cross-validation methods inside the training set to tune the hyperparameters. One of them was conventional k-fold CV, while the other three were modified with respect to the previous methods 44,[49][50][51][52] to enhance extrapolative performance, as shown in Supplementary Fig. 11.…”
Section: Validation Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We used four-fold cross-validation methods inside the training set to tune the hyperparameters. One of them was conventional k-fold CV, while the other three were modified with respect to the previous methods 44,[49][50][51][52] to enhance extrapolative performance, as shown in Supplementary Fig. 11.…”
Section: Validation Methodsmentioning
confidence: 99%
“…While these AD designs aim to differentiate reliable interpolation data from unreliable extrapolation data, they do not significantly enhance extrapolative performance. Meanwhile, various validation methods have been proposed to improve extrapolative performance by considering structure bias 44,49,50 and property bias 51,52 . DL techniques such as generative models 37 , and Neural Network Potential (NNP) models 53 have also shown potential for improving extrapolation tasks.…”
Section: Introductionmentioning
confidence: 99%
“…If we are interested in assessing model performance on new molecules, we can train a model with many reaction templates but use substructure splitting to create training, validation, and testing sets. Bemis-Murcko scaffolds [70] are commonly used to partition the data for this purpose, though clustering based on other input features or chemical similarity to measure extrapolation has also been explored [23,[71][72][73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88] as has quantifying domains of model applicability [89][90][91][92][93]. Scaffold splitting is not perfect, but by ensuring that molecules in the testing set are structurally different than those in the training set, it offers a much better assessment of generalizability than splitting randomly [17,24,67,[94][95][96][97][98][99][100][101][102][103][104][105][106][107][108][109]…”
Section: Interpolation Vs Extrapolationmentioning
confidence: 99%
“…While open questions undoubtedly exist surrounding interpolation vs. extrapolation as well as designing challenging out-of-sample test sets such as those containing new chemistry [26,28,[42][43][44][45][46][47][48][49], larger [9,50,51] and/or outlier [52] molecules/materials, or new sets appearing over time [53][54][55], this topic remains an ongoing community-wide discussion with no clear best practices, and will not be discussed further here.…”
Section: Data Splitsmentioning
confidence: 99%