Deep learning and graph‐based models have gained popularity in various life science applications such as property modeling, achieving state‐of‐the‐art performance. However, the quantification of prediction uncertainty in these models is less studied and is not applied in the low dataset size regime, which characterizes many chemical engineering‐related molecular properties. In this work, we apply two graph‐based models to model the critical‐ temperature, pressure, and volume and apply three techniques (the bootstrap, the ensemble, and the dropout) to quantify the prediction uncertainty. The overall model confidence is evaluated using the coverage. The results suggest that graph‐based models perform better compared with current group‐contribution‐based property modeling techniques while eliminating the tedious task of developing molecular descriptors.
Quantitative structure–property
relationships
(QSPRs) are
important tools to facilitate and accelerate the discovery of compounds
with desired properties. While many QSPRs have been developed, they
are associated with various shortcomings such as a lack of generalizability
and modest accuracy. Albeit various machine-learning and deep-learning
techniques have been integrated into such models, another shortcoming
has emerged in the form of a lack of transparency and interpretability
of such models. In this work, two interpretable graph neural network
(GNN) models (attentive group-contribution (AGC) and group-contribution-based
graph attention (GroupGAT)) are developed by integrating fundamentals
using the concept of group contributions (GC). The interpretability
consists of highlighting the substructure with the highest attention
weights in the latent representation of the molecules using the attention
mechanism. The proposed models showcased better performance compared
to classical group-contribution models, as well as against various
other GNN models describing the aqueous solubility, melting point,
and enthalpies of formation, combustion, and fusion of organic compounds.
The insights provided are consistent with insights obtained from the
semiempirical GC models confirming that the proposed framework allows
highlighting the important substructures of the molecules for a specific
property.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.