Figure 1: The ElectroLens user interface (UI) visualizing the electron cloud of CH 3 NO 2 molecule. The UI has two main parts: 3D viewer(s) on the left, and 2D plots on the right with option boxes in each view for customization. (A) The 3D viewer for Cartesian space. ElectroLens uses a "point cloud" to mimic the electron cloud, where the density of points corresponds to the density of electron cloud, and the color corresponds to energy density in this case. ElectroLens utilizes the ball-and-stick model to visualize atoms, where different atom types are denoted by white (hydrogen), grey (carbon), blue (nitrogen), and red (oxygen) spheres. (B) 2D plots for exploring and plotting additional features. The upper plot is the correlation plot used to identify the potentially interesting combinations of features. The plot below is a scatter plot of two features (in log-scale), where the color codes the number of data points represented. (C) Selecting parts on the scatter plot causes the corresponding points in 3D viewer and other scatter plots to be highlighted. This case illustrates the connection between the features (right) and the chemical concept of a C-N bond (left).
ABSTRACTIn recent years, machine learning (ML) has gained significant popularity in the field of chemical informatics and electronic structure theory. These techniques often require researchers to engineer abstract "features" that encode chemical concepts into a mathematical form compatible with the input to machine-learning models. However, there is no existing tool to connect these abstract features back to the actual chemical system, making it difficult to diagnose failures and to build intuition about the meaning of the features. We present ElectroLens, a new visualization tool for high-dimensional spatially-resolved features to tackle this problem. The tool visualizes high-dimensional data sets for atomistic and electron environment features by a series of linked 3D views and 2D plots. The tool is able to connect different derived features and their corresponding regions in 3D via interactive selection. It is built to be scalable, and integrate with existing infrastructure.