ET‐score: Improving Protein‐ligand Binding Affinity Prediction Based on Distance‐weighted Interatomic Contact Features Using Extremely Randomized Trees Algorithm

Rayka, Milad; Karimi‐Jafari, Mohammad Hossein; Firouzi, Rohoullah

doi:10.1002/minf.202060084

Cited by 11 publications

(20 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, customized protein–ligand interaction features became popular in scoring function development, such as ET-score (2021) and ECIF-GBT (2021) [ 104 , 106 ]. ET-score employed protein–ligand interaction features defined by distance-weighted interatomic contacts between atom type pairs of the protein and ligand.…”

Section: Machine-learning Scoring Functionmentioning

confidence: 99%

Protein–Ligand Docking in the Machine-Learning Era

2022

View full text Add to dashboard Cite

Molecular docking plays a significant role in early-stage drug discovery, from structure-based virtual screening (VS) to hit-to-lead optimization, and its capability and predictive power is critically dependent on the protein–ligand scoring function. In this review, we give a broad overview of recent scoring function development, as well as the docking-based applications in drug discovery. We outline the strategies and resources available for structure-based VS and discuss the assessment and development of classical and machine learning protein–ligand scoring functions. In particular, we highlight the recent progress of machine learning scoring function ranging from descriptor-based models to deep learning approaches. We also discuss the general workflow and docking protocols of structure-based VS, such as structure preparation, binding site detection, docking strategies, and post-docking filter/re-scoring, as well as a case study on the large-scale docking-based VS test on the LIT-PCBA data set.

show abstract

Section: Machine-learning Scoring Functionmentioning

confidence: 99%

Protein–Ligand Docking in the Machine-Learning Era

2022

View full text Add to dashboard Cite

show abstract

“…As mentioned before, after excluding the core set, the refined set 2016, the composition of the refined and the general sets 2016, and the same composition for 2019 are used as the training sets. The distance‐weighted interatomic contact featurization method was applied to protein‐ligand complexes to generate a numerical representation for them [24]. RF, ET, and GBT were adopted as fast and standard learning algorithms to discern hidden patterns in the training data.…”

Section: Resultsmentioning

confidence: 99%

“…Distances with magnitude below the predefined cutoff ( d cutoff ) are weighted by an inverse power of a natural number ( n ) and sum together. In our previous work, we demonstrated that 12

{\dot{A}}

and 2 are appropriate choices for d cutoff and n , respectively [24]. The mentioned algorithm is repeated iteratively for all possible atom types pairs, and a feature vector with 400 dimensions as a representation of a protein‐ligand complex is produced [24]:

\vcenter{\openup.5em\halign{$\displaystyle{#}$\cr \vec{X}=\left\{{X}_{H,{H}_{p}},{X}_{H,{C}_{p}},\dots, {X}_{I,{I}_{h}}\right\}\hfill\cr}}

\vcenter{\openup.5em\halign{$\displaystyle{#}$\cr {X}_{i,j}=\sum _{k=1}^{{K}_{j}}\sum _{l=1}^{{L}_{i}}{{1}\over{{d}_{lk}^{2}}}\hfill\cr}}

…”

Section: Methodsmentioning

confidence: 99%

GB‐score: Minimally designed machine learning scoring function based on distance‐weighted interatomic contact features

Rayka

Firouzi

2023

Molecular Informatics

Self Cite

View full text Add to dashboard Cite

In recent years, thanks to advances in computer hardware and dataset availability, data-driven approaches (like machine learning) have become one of the essential parts of the drug design framework to accelerate drug discovery procedures. Constructing a new scoring function, a function that can predict the binding score for a generated protein-ligand pose during docking procedure or a crystal complex, based on machine and deep learning has become an active research area in computer-aided drug design. GB-Score is a state-ofthe-art machine learning-based scoring function that utilizes distance-weighted interatomic contact features, PDBbind-v2019 general set, and Gradient Boosting Trees algorithm to the binding affinity prediction. The distanceweighted interatomic contact featurization method used the distance between different ligand and protein atom types for numerical representation of the protein-ligand complex. GB-Score attains Pearson's correlation 0.862 and RMSE 1.190 on the CASF-2016 benchmark test in the scoring power metric. GB-Score's codes are freely available on the web at https://github.com/miladrayka/GB_Score.

show abstract

“…Additionally, the establishment of an in‐silico molecular design approach is still required [2–5] . Compared to ligand‐based drug design (LBDD), structure‐based drug design (SBDD), which is closely related to this study, is significantly advantageous in terms of identifying active compounds with novel molecular frameworks and different chemical classes [6,7] . Currently, many clinical drugs, including kinase inhibitors, such as erdafitinib, pexidartinib, and vemurafenib, have been developed through SBDD, [8] which has emerged as an effective approach for drug discovery.…”

Section: Introductionmentioning

confidence: 99%

“…[2][3][4][5] Compared to ligand-based drug design (LBDD), structure-based drug design (SBDD), which is closely related to this study, is significantly advantageous in terms of identifying active compounds with novel molecular frameworks and different chemical classes. [6,7] Currently, many clinical drugs, including kinase inhibitors, such as erdafitinib, pexidartinib, and vemurafenib, have been developed through SBDD, [8] which has emerged as an effective approach for drug discovery. However, one of the challenges associated with this approach is the difficulty in obtaining accurate ligandbinding poses.…”

Section: Introductionmentioning

confidence: 99%

LCP: Simple Representation of Docking Poses for Machine Learning: A Case Study on Xanthine Oxidase Inhibitors

Kawai

Asanuma

Kato

et al. 2021

Molecular Informatics

View full text Add to dashboard Cite

In this paper, we propose a simple descriptor called the ligand coordinate profile (LCP) for describing docking poses. The LCP descriptor is generated from the coordinates of the polar hydrogen and heavy atoms of the docked ligand. We hypothesize that the prediction of binding poses can be enhanced through the combination of machine learning methods with the LCP descriptor. Two docking programs were used to predict ligand docking against xanthine oxidase. Four machine learning methods‐k‐nearest neighbors, random forest, support vector machine, and LightGBM‐were used to determine whether machine learning‐based models could be used to accurately identify the correct binding poses. Regardless of the machine learning method employed, the LCP descriptor demonstrated improved performance compared to the existing descriptor. The results of the leave‐one‐pdb‐out approach revealed that the influence of the pose descriptor was also significant, as demonstrated through cross‐validation. When evaluated using top‐N metrics, the machine learning models were generally more effective than the docking programs. In addition, the LCP‐based models outperformed those based on the existing descriptor. The results obtained in this study suggest that our proposed binding pose descriptor is effective for improving the docking accuracy of xanthine oxidase inhibitors.

show abstract

ET‐score: Improving Protein‐ligand Binding Affinity Prediction Based on Distance‐weighted Interatomic Contact Features Using Extremely Randomized Trees Algorithm

Cited by 11 publications

References 37 publications

Protein–Ligand Docking in the Machine-Learning Era

Protein–Ligand Docking in the Machine-Learning Era

GB‐score: Minimally designed machine learning scoring function based on distance‐weighted interatomic contact features

LCP: Simple Representation of Docking Poses for Machine Learning: A Case Study on Xanthine Oxidase Inhibitors

Contact Info

Product

Resources

About