Cocrystal engineering as an effective way to modify solid-state
properties has inspired great interest from diverse material fields
while cocrystal density is an important property closely correlated
with the material function. In order to accurately predict the cocrystal
density, we develop a graph neural network (GNN)-based deep learning
framework by considering three key factors of machine learning (data
quality, feature presentation, and model architecture). The result
shows that different stoichiometric ratios of molecules in cocrystals
can significantly influence the prediction performances, highlighting
the importance of data quality. In addition, the feature complementary
is not suitable for augmenting the molecular graph representation
in the cocrystal density prediction, suggesting that the complementary
strategy needs to consider whether extra features can sufficiently
supplement the lacked information in the original representation.
Based on these results, 4144 cocrystals with 1:1 stoichiometry ratio
are selected as the dataset, supplemented by the data augmentation
of exchanging a pair of coformers. The molecular graph is determined
to learn feature representation to train the GNN-based model. Global
attention is introduced to further optimize the feature space and
identify important atoms to realize the interpretability of the model.
Benefited from the advantages, our model significantly outperforms
three competitive models and exhibits high prediction accuracy for
unseen cocrystals, showcasing its robustness and generality. Overall,
our work not only provides a general cocrystal density prediction
tool for experimental investigations but also provides useful guidelines
for the machine learning application. All source codes are freely
available at .