Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Jiang, Dejun; Wu, Zhenxing; Hsieh, Chang‐Yu; Chen, Guangyong; Liao, Ben; Wang, Zhe; Shen, Chao; Cao, Dong‐Sheng; Wu, Jian; Hou, Tingjun

doi:10.21203/rs.3.rs-79416/v1

Cited by 4 publications

(5 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To benchmark the deep model, two popular machine learning methods, 8 , 33 RF and SVM, were implemented with scikit-learn. The two methods were trained on the same training sets for the binary classification but taking a single compound as input using either the ECFP6 fingerprints or the tokenized SMILES strings.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction

et al. 2021

View full text Add to dashboard Cite

Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via N -shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.

show abstract

Section: Resultsmentioning

confidence: 99%

“…Both RF and SVM prove to consistently perform well on a variety of tasks. 8 , 33 For the RF, the number of trees was set to 100, and no maximum depth for the tree was specified. The Gini Index for information gain was used.…”

Section: Methodsmentioning

confidence: 99%

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction

et al. 2021

View full text Add to dashboard Cite

show abstract

“…This falls in line with earlier research of ours [12] , and is not uncommon to occur. For example, Jiang and colleagues [34] showed that the Attention FP [35] , a graph neural network providing state of the art results on many benchmark sets, performs only on par with descriptor-based models. The results of the GCN can be found in the Supplementary Information .…”

Section: Resultsmentioning

confidence: 99%

Natural product scores and fingerprints extracted from artificial neural networks

Menke

Massa

Koch

2021

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

show abstract

“…These results come at a time when graph-based neural networks are increasingly popular for computational chemistry, although their benefits have also come into question (29). The performance of the de facto standard 2,048 bit Morgan fingerprint can be improved simply by using pharmacophoric atom invariants and/or larger fingerprint sizes.…”

Section: Discussionmentioning

confidence: 99%

State of the Art Iterative Docking with Logistic Regression and Morgan Fingerprints

Martin

2021

Preprint

View full text Add to dashboard Cite

There is renewed interest in docking campaigns for ligand-discovery since the advent of ultra-large scale virtual libraries. Using brute-force search, the scale of the libraries suggests highly parallelized compute should be used to avoid years-long computations. This paper reports a re-analysis of docking data from an ultra-large docking campaign at the D4 receptor and AmpC beta lactamase, and demonstrates large reductions in computation time to identify the top-ranked ligands. A search of ‘baseline’ featurizations shows that logistic regression on Morgan fingerprints with pharmacophoric atom invariants can match the reported performance on the same task using message-passing networks. With this approach, an ultra-large docking campaign could be performed in a matter of weeks using consumer-grade CPUs with RDKit and scikit-learn. All code and figures are available at <a href="https://github.com/ljmartin/dockop">https://github.com/ljmartin/dockop</a>

show abstract

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Cited by 4 publications

References 51 publications

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction

Siamese Recurrent Neural Network with a Self-Attention Mechanism for Bioactivity Prediction

Natural product scores and fingerprints extracted from artificial neural networks

State of the Art Iterative Docking with Logistic Regression and Morgan Fingerprints

Contact Info

Product

Resources

About