2020
DOI: 10.1021/acs.cgd.0c00767
|View full text |Cite
|
Sign up to set email alerts
|

Machine-Learning-Guided Cocrystal Prediction Based on Large Data Base

Abstract: A machine-learning model trained on the whole Cambridge Structural Database was developed to assist high-throughput cocrystal screening. With only 2D structures taken as inputs, the probability of cocrystal formation is returned for two given molecules. All of the cocrystal records in the CSD were used as positive samples, while negative samples were constructed by randomly combining different molecules into chemical pairs. Our model showed a prediction ability comparable with that of a widely used ab initio m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
71
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(72 citation statements)
references
References 55 publications
0
71
0
Order By: Relevance
“…It is difficult to retrieve negative crystallization samples in the databases and the literature. Experimentally constructed negative samples will lead to the narrow applicability domain. , The generated artificial invalid molecular pairs lack real experimental evidence and inevitably contain positive samples that have not been confirmed, , which might affect the model accuracy. Although one classification model reported recently can build a predictive model without negative data, this model is still trained on a type of cocrystal system with a limited applicability domain .…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…It is difficult to retrieve negative crystallization samples in the databases and the literature. Experimentally constructed negative samples will lead to the narrow applicability domain. , The generated artificial invalid molecular pairs lack real experimental evidence and inevitably contain positive samples that have not been confirmed, , which might affect the model accuracy. Although one classification model reported recently can build a predictive model without negative data, this model is still trained on a type of cocrystal system with a limited applicability domain .…”
Section: Discussionmentioning
confidence: 99%
“…11−13 To enlarge the applicability domain of such classification models, artificially invalid molecular combinations and cocrystals retrieved from Cambridge Structural Database are merged to train the machine learning or deep learning models. 14,15 The lattice energy landscapes of molecular combination and two pure components have been explored to determine cocrystal formation. 16−18 Hydrogen bond energies (HBE) between two molecules calculated via molecular electrostatic potential surfaces have also been utilized for cocrystal virtual screening.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Various ML models have been applied for cocrystal prediction, including support vector machines (SVMs) [30], multivariate adaptive regression splines [31], random forest (RF) [32], and network-based link-prediction [33]. In our previous work, we developed a virtual screening model based on an artificial neural network (ANN) algorithm for cocrystal prediction [34].…”
Section: Introductionmentioning
confidence: 99%
“…Devogelaer and co-workers introduced a comprehensive approach to study cocrystallization using network science and linkage prediction algorithms and constructed a data-driven co-crystal prediction tool with co-crystal data extracted from the CSD [42]. Wang et al also used a data set with co-crystal data available in the CSD and ultimately developed a machine learning model using different model types and molecular fingerprints that can be used to select appropriate coformers for a target molecule [43]. The above existing studies have shown successful results, but they have a common limitation that they only compared model performance (e.g., accuracy) without investigating features (i.e., descriptors) importance.…”
Section: Introductionmentioning
confidence: 99%