Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space

Sosnin, Sergey; Karlov, Dmitry S.; Tetko, Igor V.; Fedorov, Maxim V.

doi:10.1021/acs.jcim.8b00685

Cited by 80 publications

(85 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, it can be observed that the top three models of all the datasets were mainly occupied by the descriptorbased models (the ratio is 24/33=73%), substantiating the more powerful predictive abilities of the descriptor-based models compared to the graph-based models. Here what we found is that the graph-based models can outperform the descriptor-based models on some lager or multi-task datasets such as the HIV, Tox21 and ToxCast datasets, which is well accord with the previous conclusions where DNN excel at larger amounts of data and multi-task learning [65,66]. However, to build such generalizable and robust deep models requires large-scale high-quality datasets and the datasets in the practical drug discovery campaigns routinely suffer from narrow chemical diversity and insignificant sample sizes [67].…”

supporting

confidence: 91%

“…Numerous studies demonstrated that multi-task models have advantages over single-task models due to their ability to excavate the inconspicuous hidden relations between different subtasks and transparently share the learned features among all the tasks. [58,65,66] Nevertheless, the performance of multi-task models is highly related to the favorable correlations of individual tasks but such ready-to-use tasks are not so commonly seen in practical drug discovery campaigns.…”

Section: Performance Of Descriptor-based and Graph-based Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Jiang

Hsieh

et al. 2020

Preprint

View full text Add to dashboard Cite

Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

show abstract

supporting

confidence: 91%

Section: Performance Of Descriptor-based and Graph-based Modelsmentioning

confidence: 99%

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Jiang

Hsieh

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Sml2canSml was added as CDDD descriptors to OCHEM. These descriptors were analysed by the same methods as used in the previous work, i.e., LibSVM [57], Random Forest [58], XGBoost [59] as well as by Associative Neural Networks (ASNN) [60] and Deep Neural Networks [61]. Exactly the same protocol, fivefold cross-validation, was used for all calculations.…”

Section: Qsar Modelingmentioning

confidence: 99%

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

2020

Self Cite

View full text Add to dashboard Cite

We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https ://githu b.com/bigch em/trans forme r-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model's result. OCHEM [3] environment (https ://ochem .eu) hosts the on-line implementation of the method proposed.

show abstract

“…Their detailed description can be found elsewhere [4]. Associative Neural Networks (ASNN) [11], Deep Neural Network (DNN) [12], Extreme Gradient Boost (XGBOOST) [13], and Least Squares Support Vector Machine (LSSVM) [14] algorithms were analyzed for training the models. The methods were used with default parameters as specified on the OCHEM web site.…”

Section: Methodsmentioning

confidence: 99%

Analysis and Modelling of False Positives in GPCR Assays

Ghosh

Tetko

Klebl

et al. 2019

Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions

Self Cite

View full text Add to dashboard Cite

G-Protein Coupled Receptors (GPCR) are involved in all the major signaling pathways. As a result, they often serve as potential target for therapeutic drugs. In this study we analyze publicly available assays involving different classes of GPCR to identify false positives. Using the latest developments in Machine Learning, we then build models that can predict such compounds with high confidence. Given the ubiquity of GPCR assays, we believe such models will be very helpful in flagging potential false positives for further testing.

show abstract

Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space

Cited by 80 publications

References 54 publications

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

Analysis and Modelling of False Positives in GPCR Assays

Contact Info

Product

Resources

About