Experimentally and computationally [39][40][41][42][43][44][45][46][47][48][49][50] validated machine learning (ML) articles are sorted based on size of the training data: 1-100, 101-10 000, and 10 000+ in a comprehensive set summarizing legacy and recent advances in the field. The review emphasizes the interrelated fields of synthesis, characterization, and prediction. Size range 1-100 consists mostly of Bayesian optimization (BO) articles, whereas 101-10 000 consists mostly of support vector machine (SVM) articles. The articles often use combinations of ML, feature selection (FS), adaptive design (AD), high-throughput (HiTp) techniques, and domain knowledge to enhance predictive performance and/or model interpretability. Grouping cross-validation (G-CV) techniques curb overly optimistic extrapolative predictive performance. Smallerdatasets relying on AD are typically able to identify new materials with desired properties but do so in a constrained design space. In larger datasets, the low-hanging fruit of materials optimization are typically already discovered, and the models are generally less successful at extrapolating to new mate-