As materials data sets grow in size and scope, the role of data mining and statistical learning methods to analyze these materials data sets and build predictive models is becoming more important. This manuscript introduces matminer, an open-source, Python-based software platform to facilitate datadriven methods of analyzing and predicting materials properties. Matminer provides modules for retrieving large data sets from external databases such as the Materials Project, Citrination, Materials Data Facility, and Materials Platform for Data Science. It also provides implementations for an extensive library of feature extraction routines developed by the materials community, with 44 featurization classes that can generate thousands of individual descriptors and combine them into mathematical functions. Finally, matminer provides a visualization module for producing interactive, shareable plots. These functions are designed in a way that integrates closely with machine learning and data analysis packages already developed and in use by the Python data science community. We explain the structure and logic of matminer, provide a description of its various modules, and showcase several examples of how matminer can be used to collect data, reproduce data mining studies reported in the literature, and test new methodologies.
We present an overview and preliminary analysis of computed thermoelectric properties for more than 48 000 inorganic compounds from the Materials Project (MP). We compare our calculations with available experimental data to evaluate the accuracy of different approximations in predicting thermoelectric properties. We observe fair agreement between experiment and computation for the maximum Seebeck coefficient determined with MP band structures and the BoltzTraP code under a constant relaxation time approximation (R 2 = 0.79). We additionally find that scissoring the band gap to the experimental value improves the agreement. We find that power factors calculated with a constant and universal relaxation time approximation show much poorer agreement with experiment (R 2 = 0.33). We test two minimum thermal conductivity models (Clarke and Cahill-Pohl), finding that both these models reproduce measured values fairly accurately (R 2 = 0.82) using parameters obtained from computation. Additionally, we analyze this data set to gain broad insights into the effects of chemistry, crystal structure, and electronic structure on thermoelectric properties. For example, our computations indicate that oxide band structures tend to produce lower power factors than those of sulfides, selenides, and tellurides, even under the same doping and relaxation time constraints. We also list families of compounds identified to possess high valley degeneracies.Finally, we present a clustering analysis of our results. We expect that these studies should help guide and assess future high-throughput computational screening studies of thermoelectric materials.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.