Increasing attention has been paid to materials informatics approaches that promise efficient and fast discovery and optimization of functional inorganic materials. Technical breakthrough is urgently requested to advance this field and efforts have been made in the development of materials descriptors to encode or represent characteristics of crystalline solids, such as chemical composition, crystal structure, electronic structure, etc. We propose a general representation scheme for crystalline solids that lifts restrictions on atom ordering, cell periodicity, and system cell size based on structural descriptors of directly binned Voronoi-tessellation real feature values and atomic/chemical descriptors based on the electronegativity of elements in the crystal. Comparison was made vs. radial distribution function (RDF) feature vector, in terms of predictive accuracy on density functional theory (DFT) material properties: cohesive energy (CE), density (d), electronic band gap (BG), and decomposition energy (Ed). It was confirmed that the proposed feature vector from Voronoi real value binning generally outperforms the RDF-based one for the prediction of aforementioned properties. Together with electronegativity-based features, Voronoi-tessellation features from a given crystal structure that are derived from second-nearest neighbor information contribute significantly towards prediction.
Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, persistent homology is a well-known tool to extract robust topological features, and outputs as persistence diagrams (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the Wasserstein metric. However, Wasserstein distance is not negative definite. Thus, it is limited to build positive definite kernels upon the Wasserstein distance without approximation. In this work, we rely upon the alternative Fisher information geometry to propose a positive definite kernel for PDs without approximation, namely the Persistence Fisher (PF) kernel. Then, we analyze eigensystem of the integral operator induced by the proposed kernel for kernel machines. Based on that, we derive generalization error bounds via covering numbers and Rademacher averages for kernel machines with the PF kernel. Additionally, we show some nice properties such as stability and infinite divisibility for the proposed kernel. Furthermore, we also propose a linear time complexity over the number of points in PDs for an approximation of our proposed kernel with a bounded error. Throughout experiments with many different tasks on various benchmark datasets, we illustrate that the PF kernel compares favorably with other baseline kernels for PDs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.