XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties

Deng, Daiguo; Chen, Xiaowei; Zhang, Ruochi; Lei, Zengrong; Wang, Xiaojian; Zhou, Fengfeng

doi:10.1021/acs.jcim.0c01489

Cited by 55 publications

(43 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The training-validation loss ratio could serve as a heuristic to indicate overfitting in some instances, what constitutes a suitable threshold may differ according to the model type and the dataset. Various machine-learning models, especially in intricate architectures such as deep learning, have been found to be a practical approach, even when the ratio between training loss and validation loss is high [47][48][49]. A well-established phenomenon in deep learning, as well as some classical machine learning, has addressed this issue regarding the bias-variance tradeoff, known for the double descent risk curve [50].…”

Section: Model Results and Validationmentioning

confidence: 99%

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

et al. 2022

View full text Add to dashboard Cite

A multitargeted therapeutic approach with hybrid drugs is a promising strategy to enhance anticancer efficiency and overcome drug resistance in nonsmall cell lung cancer (NSCLC) treatment. Estimating affinities of small molecules against targets of interest typically proceeds as a preliminary action for recent drug discovery in the pharmaceutical industry. In this investigation, we employed machine learning models to provide a computationally affordable means for computer-aided screening to accelerate the discovery of potential drug compounds. In particular, we introduced a quantitative structure–activity-relationship (QSAR)-based multitask learning model to facilitate an in silico screening system of multitargeted drug development. Our method combines a recently developed graph-based neural network architecture, principal neighborhood aggregation (PNA), with a descriptor-based deep neural network supporting synergistic utilization of molecular graph and fingerprint features. The model was generated by more than ten-thousands affinity-reported ligands of seven crucial receptor tyrosine kinases in NSCLC from two public data sources. As a result, our multitask model demonstrated better performance than all other benchmark models, as well as achieving satisfying predictive ability regarding applicable QSAR criteria for most tasks within the model’s applicability. Since our model could potentially be a screening tool for practical use, we have provided a model implementation platform with a tutorial that is freely accessible hence, advising the first move in a long journey of cancer drug development.

show abstract

Section: Model Results and Validationmentioning

confidence: 99%

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Extensive literature reports concerning the benchmarks of algorithm models using the aforementioned databases applied to VS related tasks are available, such as molecular property predictions, fingerprint generation or the evaluation of structural protein-ligand docking parameters. These include the following: Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), Random Forest (RF), and Deep Neural Networks (DNN) ( Jiang et al, 2021 ) as representatives of descriptor-based models and many graph-based algorithm variants, such as MPNN—Message Passing Neural Networks ( Yang et al, 2019 ; Deng et al, 2021 ; Jiang et al, 2021 ) and networks implementing algorithm model variants involving spatial graph convolution, like GCN—Graph Convolution Network ( Li et al, 2017 ; Xiong et al, 2020 ; Menke and Koch, 2020 ; Deng et al, 2021 ; Hsieh et al, 2020 ) or GC—Graph Convolution ( Wu et al, 2018 ) and spectral graph convolution, such as AGCN–Adaptive Graph Convolution ( Li et al, 2018 ), graph based networks including attention mechanisms of interaction between near nodes or edges, i.e., AFP—Attentive Fingerprint ( Xiong et al, 2020 ; Jiang et al, 2021 ), PAGTN—Path-Augmented Graph Transformer Network ( Chen et al, 2019 ), EAGCN—Edge Attention GCN ( Shang et al, 2018 ), among others ( Wu et al, 2018 ; Lim et al, 2019 ).…”

Section: Introductionmentioning

confidence: 99%

“…Direct Message Passage Neural Network, D-MPNN—a graph based model combined with Extreme Gradient Boost, XGBoost—a descriptor-based network as the output layer, achieved the best results for several of the presented dataset ( Deng et al, 2021 ). Furthermore, the concatenation of molecular fingerprint vectors generated by conventional models with descriptors generated using graph models have been reported as providing the best prediction results when submitted to the final parameter generation layers ( Wang et al, 2019 ).…”

Section: Introductionmentioning

confidence: 99%

Graph Neural Networks as a Potential Tool in Improving Virtual Screening Programs

et al. 2022

View full text Add to dashboard Cite

Despite the increasing number of pharmaceutical companies, university laboratories and funding, less than one percent of initially researched drugs enter the commercial market. In this context, virtual screening (VS) has gained much attention due to several advantages, including timesaving, reduced reagent and consumable costs and the performance of selective analyses regarding the affinity between test molecules and pharmacological targets. Currently, VS is based mainly on algorithms that apply physical and chemistry principles and quantum mechanics to estimate molecule affinities and conformations, among others. Nevertheless, VS has not reached the expected results concerning the improvement of market-approved drugs, comprising less than twenty drugs that have reached this goal to date. In this context, graph neural networks (GNN), a recent deep-learning subtype, may comprise a powerful tool to improve VS results concerning natural products that may be used both simultaneously with standard algorithms or isolated. This review discusses the pros and cons of GNN applied to VS and the future perspectives of this learnable algorithm, which may revolutionize drug discovery if certain obstacles concerning spatial coordinates and adequate datasets, among others, can be overcome.

show abstract

“…Another aspect of their attractiveness for molecular property prediction is the ease with which a molecule can be described as an undirected graph, transforming atoms to nodes and bonds to edges encoded both atom and bond properties. GNNs have proven to be useful and powerful tools in the machine learning molecular modeling toolbox [19,20].…”

Section: Introductionmentioning

confidence: 99%

Improving Small Molecule pK_a Prediction Using Transfer Learning with Graph Neural Networks

Mayr

Wieder

et al. 2022

Preprint

View full text Add to dashboard Cite

Enumerating protonation states and calculating micro-state pKa values of small molecules is an important yet challenging task for lead optimization and molecular modeling. Commercial and non-commercial solutions have notable limitations such as restrictive and expensive licenses, high CPU/GPU hour requirements or the need for expert knowledge to setup and use. We present a graph neural network model that is trained on 714,906 calculated mico-state pKa predictions from molecules obtained from the ChEMBL database. The model is fine-tuned on a set of 5,994 experimental pKa values significantly improving its performance on two challenging test sets. Combining the graph neural network model with Dimorphite-DL, an open-source program for enumerating ionization states, we have developed the open-source Python package pkasolver, which is able to generate and enumerate protonation states and calculate micro-state pKa values with high accuracy.

show abstract

XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties

Cited by 55 publications

References 58 publications

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

Graph Neural Networks as a Potential Tool in Improving Virtual Screening Programs

Improving Small Molecule pK_a Prediction Using Transfer Learning with Graph Neural Networks

Contact Info

Product

Resources

About

XGraphBoost: Extracting Graph Neural Network-Based Features for a Better Prediction of Molecular Properties

Cited by 55 publications

References 58 publications

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

Assisting Multitargeted Ligand Affinity Prediction of Receptor Tyrosine Kinases Associated Nonsmall Cell Lung Cancer Treatment with Multitasking Principal Neighborhood Aggregation

Graph Neural Networks as a Potential Tool in Improving Virtual Screening Programs

Improving Small Molecule pKa Prediction Using Transfer Learning with Graph Neural Networks

Contact Info

Product

Resources

About

Improving Small Molecule pK_a Prediction Using Transfer Learning with Graph Neural Networks