2023
DOI: 10.1038/s41524-023-01000-z
|View full text |Cite
|
Sign up to set email alerts
|

Small data machine learning in materials science

Abstract: This review discussed the dilemma of small data faced by materials machine learning. First, we analyzed the limitations brought by small data. Then, the workflow of materials machine learning has been introduced. Next, the methods of dealing with small data were introduced, including data extraction from publications, materials database construction, high-throughput computations and experiments from the data source level; modeling algorithms for small data and imbalanced learning from the algorithm level; acti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 132 publications
(65 citation statements)
references
References 109 publications
0
35
0
Order By: Relevance
“…56 Similarly, it is important to make a clear distinction between relative concept of size of the data big and small data in computer and material science. 57 Xu et al 57 reported that most of the data employed in material ML are still categorized as small data; however, the goals of ML model depend on the prediction skill which remain the same requirement for small data. The modeler should use the small data with caution as the uncertainty tends to increase compared to big data.…”
Section: Methodsmentioning
confidence: 99%
“…56 Similarly, it is important to make a clear distinction between relative concept of size of the data big and small data in computer and material science. 57 Xu et al 57 reported that most of the data employed in material ML are still categorized as small data; however, the goals of ML model depend on the prediction skill which remain the same requirement for small data. The modeler should use the small data with caution as the uncertainty tends to increase compared to big data.…”
Section: Methodsmentioning
confidence: 99%
“…To mitigate the common issues of overfitting and underfitting often encountered in small data set ML, we employed an approach of modeling with a minimal set of features while achieving a certain level of accuracy. Additionally, the ML algorithms chosen, namely, SVM and LightGBM, are well-suited for small data sets . Nevertheless, the realm of small data set ML still presents constraints.…”
Section: Conclusion and Outlooksmentioning
confidence: 99%
“…Additionally, the ML algorithms chosen, namely, SVM and LightGBM, are well-suited for small data sets. 81 Nevertheless, the realm of small data set ML still presents constraints. The challenge of predicting samples with elements of low occurrence frequency in the original data sets, due to data imbalance, leads to a trust crisis.…”
Section: Conclusion and Outlooksmentioning
confidence: 99%
“…Support Vector for Regression was first introduced by Vapnik, Golowich, and Alex in 1997, which is a variant of the Support Vector Machine (SVM). , SVM is a supervised learning algorithm that is helpful for the prediction of discrete values. The principle used in SVR is similar to that of SVM, and the aim of SVR is to find the best fit line.…”
Section: Experimental Evaluationsmentioning
confidence: 99%