Peng Gao scite author profile

How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.

show abstract

General Protocol for the Accurate Prediction of Molecular ¹³C/¹H NMR Chemical Shifts via Machine Learning Augmented DFT

Gao

Zhang

Peng

et al. 2020

J. Chem. Inf. Model.

View full text Add to dashboard Cite

An accurate prediction of NMR chemical shifts at affordable computational cost is very important for different types of structural assignments in experimental studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) are two of the most popular computational methods for NMR calculation, yet they often fail to resolve ambiguities in structural assignments. Here, we present a new method that uses machine learning (ML) techniques (DFT + ML) that significantly increases the accuracy of 13 C/ 1 H NMR chemical shift prediction for a variety of organic molecules. The input of the generalizable DFT + ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT + ML model was trained with a data set containing 476 13 C and 270 1 H experimental chemical shifts. For the DFT methods used here, the root mean square deviations (RMSDs) for the errors between predicted and experimental 13 C/ 1 H chemical shifts can be as small as 2.10/0.18 ppm, which is much lower than those from simple DFT (5.54/0.25 ppm), or DFT + linear regression (LR) (4.77/0.23 ppm) approaches. It also has a smaller maximum absolute error than two previously proposed NMR-predicting ML models. The robustness of the DFT + ML model is tested on two classes of organic molecules (TIC10 and hyacinthacines), where the correct isomers were unambiguously assigned to the experimental ones. Overall, the DFT + ML model shows promise for structural assignments in a variety of systems, including stereoisomers, that are often challenging to determine experimentally.

show abstract

Towards an Accurate Prediction of Nitrogen Chemical Shifts by Density Functional Theory and Gauge‐Including Atomic Orbital

Gao

Wang

2018

Advcd Theory and Sims

View full text Add to dashboard Cite

An efficient, yet accurate, computational protocol for predicting nitrogen nuclear magnetic resonance (NMR) chemical shifts based on density functional theory and the gauge-including atomic orbital approach is proposed. A database of small and relatively rigid compounds containing nitrogen atoms is compiled. Scaling factors for the linear correlation between experimental 15 N chemical shifts and calculated isotropic shielding constants are systematically investigated with seven different levels of theory in both chloroform and dimethyl sulfoxide, two commonly used solvents for NMR experiments. The best method yields a root-mean-square deviation of about 5.30 and 7.00 ppm in CHCl 3 and dimethyl sulfoxide (DMSO), respectively. Moreover, another set of scaling factors for -NH 2 chemical shifts is also proposed based on a separate database with three levels of theory. Furthermore, it is encouraging that a reasonable transferability for the linear correlation is found between these two solvents. This finding will enable broader applications of the developed empirical scaling factors to other commonly used solvents in NMR experiments. The consistency between theoretical predictions and experimental results for structural elucidations is illustrated for selected examples including regioisomers, tautomers, oxidation states, and protonated structures.

show abstract

¹¹B NMR Chemical Shift Predictions via Density Functional Theory and Gauge-Including Atomic Orbital Approach: Applications to Structural Elucidations of Boron-Containing Molecules

Gao¹,

Wang²,

Huang

et al. 2019

ACS Omega

View full text Add to dashboard Cite

11 B nuclear magnetic resonance (NMR) spectroscopy is a useful tool for studies of boron-containing compounds in terms of structural analysis and reaction kinetics monitoring. A computational protocol, which is aimed at an accurate prediction of 11 B NMR chemical shifts via linear regression, was proposed based on the density functional theory and the gauge-including atomic orbital approach. Similar to the procedure used for carbon, hydrogen, and nitrogen chemical shift predictions, a database of boron-containing molecules was first compiled. Scaling factors for the linear regression between calculated isotropic shielding constants and experimental chemical shifts were then fitted using eight different levels of theory with both the solvation model based on density and conductor-like polarizable continuum model solvent models. The best method with the two solvent models yields a root-mean-square deviation of about 3.40 and 3.37 ppm, respectively. To explore the capabilities and potential limitations of the developed protocols, classical boron–hydrogen compounds and molecules with representative boron bonding environments were chosen as test cases, and the consistency between experimental values and theoretical predictions was demonstrated.

show abstract

Predicting Thermophilic Proteins by Machine Learning

Wang

Gao

Yifeng

et al. 2020

CBIO

View full text Add to dashboard Cite

Background: Thermophilic proteins can maintain good activity under high temperature, so it is important to study thermophilic proteins for the thermal stability of proteins. Objective: In order to solve the problem of low precision and low efficiency in predicting thermophilic proteins, a prediction method based on feature fusion and machine learning was proposed in this paper. Method: For the selected thermophilic data sets, firstly, the thermophilic protein sequence was characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and autocorrelation coefficient. Then, kernel principal component analysis (KPCA) was used to reduce the dimension of the expressed protein sequence features in order to reduce training time and improve efficiency. Finally, the classification model was designed by using classification algorithm. Results: A variety of classification algorithms were used to train and test on the selected thermophilic dataset. By comparison, the accuracy of the support vector machine (SVM) under the jackknife method was over 92%. The combination of other evaluation indicators also proved that the SVM performance was the best. Conclusion: Because of choosing an effectively feature representation method and a robust classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to most reported methods.

show abstract

On diversity reception over fading channels with impulsive noise

Tepedelenlioğlu

Gao

View full text Add to dashboard Cite

Space-time coding over fading channels with impulsive noise

Gao

Tepedelenlioğlu

2007

IEEE Trans. Wireless Commun.

View full text Add to dashboard Cite

Accurate predictions of aqueous solubility of drug moleculesviathe multilevel graph convolutional network (MGCN) and SchNet architectures

Gao

Zhang

Sun

et al. 2020

Phys. Chem. Chem. Phys.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peng Gao

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

General Protocol for the Accurate Prediction of Molecular ¹³C/¹H NMR Chemical Shifts via Machine Learning Augmented DFT

Towards an Accurate Prediction of Nitrogen Chemical Shifts by Density Functional Theory and Gauge‐Including Atomic Orbital

¹¹B NMR Chemical Shift Predictions via Density Functional Theory and Gauge-Including Atomic Orbital Approach: Applications to Structural Elucidations of Boron-Containing Molecules

Predicting Thermophilic Proteins by Machine Learning

On diversity reception over fading channels with impulsive noise

Space-time coding over fading channels with impulsive noise

Accurate predictions of aqueous solubility of drug moleculesviathe multilevel graph convolutional network (MGCN) and SchNet architectures

Contact Info

Product

Resources

About

Peng Gao

An effective self-supervised framework for learning expressive molecular global representations to drug discovery

General Protocol for the Accurate Prediction of Molecular 13C/1H NMR Chemical Shifts via Machine Learning Augmented DFT

Towards an Accurate Prediction of Nitrogen Chemical Shifts by Density Functional Theory and Gauge‐Including Atomic Orbital

11B NMR Chemical Shift Predictions via Density Functional Theory and Gauge-Including Atomic Orbital Approach: Applications to Structural Elucidations of Boron-Containing Molecules

Predicting Thermophilic Proteins by Machine Learning

On diversity reception over fading channels with impulsive noise

Space-time coding over fading channels with impulsive noise

Accurate predictions of aqueous solubility of drug moleculesviathe multilevel graph convolutional network (MGCN) and SchNet architectures

Contact Info

Product

Resources

About

General Protocol for the Accurate Prediction of Molecular ¹³C/¹H NMR Chemical Shifts via Machine Learning Augmented DFT

¹¹B NMR Chemical Shift Predictions via Density Functional Theory and Gauge-Including Atomic Orbital Approach: Applications to Structural Elucidations of Boron-Containing Molecules