Algorithmically identifying the meaningful
similarities between
an assortment of molecules is a critical chemical problem, and one
which is only gaining in relevance as data-driven chemistry continues
to progress. Effectively addressing this challenge can be achieved
through a reformulation of the problem into information theory, cluster-based
supervised classification, and the implementation of key concepts,
particularly information entropy and mutual information. These concepts
are combined with unsupervised learning atop learned chemical spaces
to generate meaningful labels for arbitrary collections of molecules.
An open-source and highly extensible codebase is provided to undertake
these experiments, demonstrate the viability of the approach on known
clusters, and glean insights into the learned representations of chemical
space within message-passing neural networks, an architecture not
readily permitting interpretability. This approach facilitates the
interoperability between human chemical knowledge and the algorithmically
derived insights, which will continue to become more prevalent in
the coming years.