We report a comprehensive computational study of unsupervised machine learning for extraction of chemically relevant information in X-ray absorption near edge structure (XANES) and in valence-to-core X-ray emission spectra (VtC-XES)...
We analyze an ensemble of organophosphorus compounds to form an unbiased characterization of the information encoded in their X-ray absorption near-edge structure (XANES) and valence-to-core X-ray emission spectra (VtC-XES). Data-driven emergence of chemical classes via unsupervised machine learning, specifically cluster analysis in the Uniform Manifold Approximation and Projection (UMAP) embedding, finds spectral sensitivity to coordination, oxidation, aromaticity, intramolecular hydrogen bonding, and ligand identity. Subsequently, we implement supervised machine learning via Gaussian process classifiers to identify confidence in predictions that match our initial qualitative assessments of clustering. The results further support the benefit of utilizing unsupervised machine learning as a precursor to supervised machine learning, which we term Unsupervised Validation of Classes (UVC), a result that goes beyond the present case of X-ray spectroscopies.
As spectral imaging techniques are becoming more prominent in science, advanced image segmentation algorithms are required to identify appropriate domains in these images. We present a version of image segmentation called manifold projection image segmentation (MPIS) that is generally applicable to a broad range of systems without the need for training because MPIS uses unsupervised machine learning with a few physically motivated hyperparameters. We apply MPIS to nano-XANES imaging, where X-ray Absorption Near Edge Structure (XANES) spectra are collected with nanometer spatial resolution. We show the superiority of manifold projection over linear transformations, such as the commonly used Principal Component Analysis (PCA). Moreover, MPIS maintains accuracy while reducing computation time and sensitivity to noise compared to the standard nano-XANES imaging analysis procedure. Finally, we demonstrate how multimodal information, such as X-ray Fluorescence (XRF) data and spatial location of pixels, can be incorporated into the MPIS framework. We propose that MPIS is adaptable for any spectral imaging technique, including Scanning Transmission X-ray Microscopy (STXM), where the length scale of domains is larger than the resolution of the experiment.
We report a comprehensive computational study of unsupervised machine learning for extraction of chemically relevant information in X-ray absorption near edge structure (XANES) and in valence-to-core X-ray emission spectra (VtC-XES) for classification of a broad ensemble of sulforganic molecules. By progressively decreasing the constraining assumptions of the unsupervised machine learning algorithm, moving from principal component analysis to a variational autoencoder to t-distributed stochastic neighbor embedding (t-SNE), we find improved sensitivity to steadily more refined chemical information. Surprisingly, even in merely two dimensions, t-SNE distinguishes not just oxidation state and general sulfur bonding environment but the aromaticity of the bonding radical group with 87% accuracy as well as identifying even finer details in electronic structure within aromatic or aliphatic sub-classes. We find that the chemical information in XANES and VtC-XES is very similar, although they exhibit an unexpected tendency to have different sensitivity within a given molecular class.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.