With the rapid development of artificial intelligence and machine learning (ML) methods, materials science is rapidly entering the era of data-driven materials informatics. ML models serve as the most crucial component, closely bridging material structure and properties. There is a considerable difference in the prediction performance of different ML methods for material systems. Herein, we evaluated models of three categories (linear, kernel, and nonlinear method), with twelve ML algorithms commonly used in the materials field. In addition, halide perovskite was chosen as an example to evaluate the fitting performance of different models. We constructed a total dataset of 540 halide perovskites and 72 features, with formation energy and bandgap as target properties. We found that different categories of ML models show similar trends for different target properties. Among them, the difference between the models is enormous for the formation energy, with the coefficient of determination (R2) range: 0.69~0.953, while the fitting performance between the models whose R2 range: 0.941~0.997 is closer for bandgap. The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap. It shows that the nonlinear-ensemble model, constructed by combining multiple weak learners, effectively describes the nonlinear relationship between material features and target property. In addition, the eXtreme gradient boosting decision tree model performs the most superior results among all the models and the searching of two new descriptors that are crucial for formation energy and bandgap. Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems.
The considerable thermal expansion of halide perovskites is one of the challenges to device stability, yet the physical origin and modulation strategy remain unclear. Herein, we report first-principles calculations of the thermal properties of halide perovskites at 300 K using oxides as a reference. We found that the large thermal expansion of halide perovskites can mainly be attributed to their low bulk modulus and volumetric heat capacity because of the soft crystal lattice, whereas composition-dependent anharmonicity emerges as the most important factor in determining thermal expansion with the same structure. We discovered that thermal expansion of halide perovskites can be decreased by weakening the B−X bond to promote the octahedral anharmonicity. We further proposed an effective thermal expansion coefficient descriptor of halide perovskites with a Pearson correlation coefficient of nearly −80%. Our findings provide insights into the underlying mechanisms and chemical trends in the thermal expansion behavior of halide perovskites.
Open framework structures (e.g., ScF3, Sc2W3O12, etc.) exhibit significant potential for thermal expansion tailoring owing to their high atomic vibrational degrees of freedom and diverse connectivity between polyhedral units, displaying positive/negative thermal expansion (PTE/NTE) coefficients at a certain temperature. Despite the proposal of several physical mechanisms to explain the origin of NTE, an accurate mapping relationship between the structural-compositional properties and thermal expansion behavior is still lacking. This deficiency impedes the rapid evaluation of thermal expansion properties and hinders the design and development of such materials. We developed an algorithm for identifying and characterizing the connection patterns of structural units in open-framework structures and constructed a descriptor set for the thermal expansion properties of this system, which is composed of connectivity and elemental information. Our developed descriptor, aided by machine learning (ML) algorithms, can effectively learn the thermal expansion behavior in small sample datasets collected from literature-reported experimental data (246 samples). The trained model can accurately distinguish the thermal expansion behavior (PTE/NTE), achieving an accuracy of 92%. Additionally, our model predicted six new thermodynamically stable NTE materials, which were validated through first-principles calculations. Our results demonstrate that developing effective descriptors closely related to thermal expansion properties enables ML models to make accurate predictions even on small sample datasets, providing a new perspective for understanding the relationship between connectivity and thermal expansion properties in the open framework structure. The datasets that were used to support these results are available on Science Data Bank, accessible via the link https://doi.org/10.57760/sciencedb.j00113.00100(https://www.scidb.cn/s/buMzeu)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.