Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Westen, Gerard J. P. van; Swier, Remco F.; Cortés-Ciriano, Isidro; Wegner, Jörg K.; Overington, John P.; IJzerman, Adriaan P.; Vlijmen, Herman van; Bender, Andreas

doi:10.1186/1758-2946-5-42

Cited by 72 publications

(91 citation statements)

References 57 publications

(88 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We found that all descriptors, with the exception of SOCN, performed at the same level of statistical significance (ANOVA P >0.05; Tukey’s Honest Significance Difference (HSD), α =0.05, n =15) (Table 1 and Figure 2A). These results are in agreement with previous studies, where model performance did not consistently vary across amino acid descriptor sets, and where prediction errors for individual targets were higher than the error differences obtained with models trained on different combinations of amino acid descriptors 26,30,32. Therefore, we conclude that the predictive signal provided by all protein descriptors except SOCN for the modelling of PARP inhibition is comparable.…”

Section: Resultssupporting

confidence: 93%

See 1 more Smart Citation

Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction

Cortés-Ciriano

Bender

Malliavin

2015

Molecular Informatics

Self Cite

View full text Add to dashboard Cite

Poly(ADP-ribose) polymerases (PARPs) play a key role in DNA damage repair. PARP inhibitors act as chemo- and radio- sensitizers and thus potentiate the cytotoxicity of DNA damaging agents. Although PARP inhibitors are currently investigated as chemotherapeutic agents, their cross-reactivity with other members of the PARP family remains unclear. Here, we apply Proteochemometric Modelling (PCM) to model the activity of 181 compounds on 12 human PARPs. We demonstrate that PCM (R0 (2) test =0.65-0.69; RMSEtest =0.95-1.01 °C) displays higher performance on the test set (interpolation) than Family QSAR and Family QSAM (Tukey's HSD, α 0.05), and outperforms Inductive Transfer knowledge among targets (Tukey's HSD, α 0.05). We benchmark the predictive signal of 8 amino acid and 11 full-protein sequence descriptors, obtaining that all of them (except for SOCN) perform at the same level of statistical significance (Tukey's HSD, α 0.05). The extrapolation power of PCM to new compounds (RMSE=1.02±0.80 °C) and targets (RMSE=1.03±0.50 °C) is comparable to interpolation, although the extrapolation ability is not uniform across the chemical and the target space. For this reason, we also provide confidence intervals calculated with conformal prediction. In addition, we present the R package conformal, which permits the calculation of confidence intervals for regression and classification caret models.

show abstract

Section: Resultssupporting

confidence: 93%

“…To determine which protein descriptors provide the highest predictive signal, we benchmarked 8 binding site amino acid (Table 1A)29,30 and 11 full protein sequence descriptors31 (Table 1B). We trained 15 models for each combination of compound and protein descriptors, each time using different resamples to define the training and test sets.…”

Section: Resultsmentioning

confidence: 99%

Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction

Cortés-Ciriano

Bender

Malliavin

2015

Molecular Informatics

Self Cite

View full text Add to dashboard Cite

show abstract

“…PCM is a branch of chemometrics which uses mathematical and statistical approaches to model the interactions between a series of ligands and a set of receptors. One major strength of PCM is that it does not require structural information of proteins to provide specific information about their functions . Since its first introduction by Lapinsh et al in 2001, the approach has been successfully applied to investigate different protein families such as cytochrome P450, kinases, melanocortin receptors, G protein‐coupled receptors, HIV proteases, aromatases, carbonic anhydrases, and phosphodiesterases .…”

Section: Introductionmentioning

confidence: 99%

Probing the origin of dihydrofolate reductase inhibition via proteochemometric modeling

Hariri

Ghasemi

Shirini

et al. 2018

Journal of Chemometrics

View full text Add to dashboard Cite

Dihydrofolate reductase (DHFR) is an essential enzyme in the folate metabolism pathway and an important target of antineoplastic, antimicrobial, antiprotozoal, and antiinflammatory drugs. Despite the clinical effectiveness of current antifolate treatments, new drugs are needed to be designed due to developing resistance of this enzyme through multiple‐site mutagenesis. Understanding the factors affecting the ligand binding selectivity profiles among DHFR families is critical for the design of novel potent and selective inhibitors, with the least side effects, against DHFR of pathogens. Hybrid scaffolds containing pyrimidine ring are effective in DHFR inhibition. In this study, using proteochemometric (PCM) modeling, we designed and evaluated new potent pyrimidine scaffold‐based inhibitors via 3‐dimensional alignment‐free GRid‐INdependent Descriptors (GRIND), VolSurf molecular, and sequence‐based (z‐scale) descriptors to provide ligand and receptor descriptors, respectively. Validation and robustness of the model were confirmed by venetian blinds cross‐validation and Y‐scrambling approaches, respectively. Applicability domain (AD) analysis was performed to estimate the likelihood of reliable prediction for compounds. To show the applicability of the PCM model, new ligands were designed using structural data retrieved from this model. Inhibitory activities of the designs were then predicted, and selectivity ratio profiles were investigated. Finally, potent and highly selective inhibitors were identified regarding the protozoan parasite Toxoplasma gondii, followed by evaluating the ADMET parameters of the ligands.

show abstract

“…Unlike the traditional QSAR, in proteochemometric modeling (PCM) approach descriptors of proteins and cross‐terms made from descriptors of ligands and proteins are correlated with the activity data for protein–ligand interactions . A recent study has revealed that combinations of descriptors from different aspect may help increase the performance of proteochemometric modeling .…”

Section: Introductionmentioning

confidence: 99%

Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure‐based and Sequence‐based Descriptors

Rasti

Namazi

Karimi‐Jafari

et al. 2016

Molecular Informatics

View full text Add to dashboard Cite

Due to its physiological and clinical roles, carbonic anhydrase (CA) is one of the most interesting case studies. There are different classes of CAinhibitors including sulfonamides, polyamines, coumarins and dithiocarbamates (DTCs). However, many of them hardly act as a selective inhibitor against a specific isoform. Therefore, finding highly selective inhibitors for different isoforms of CA is still an ongoing project. Proteochemometrics modeling (PCM) is able to model the bioactivity of multiple compounds against different isoforms of a protein. Therefore, it would be extremely applicable when investigating the selectivity of different ligands towards different receptors. Given the facts, we applied PCM to investigate the interaction space and structural properties that lead to the selective inhibition of CA isoforms by some dithiocarbamates. Our models have provided interesting structural information that can be considered to design compounds capable of inhibiting different isoforms of CA in an improved selective manner. Validity and predictivity of the models were confirmed by both internal and external validation methods; while Y-scrambling approach was applied to assess the robustness of the models. To prove the reliability and the applicability of our findings, we showed how ligands-receptors selectivity can be affected by removing any of these critical findings from the modeling process.

show abstract

Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

Cited by 72 publications

References 57 publications

Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction

Prediction of PARP Inhibition with Proteochemometric Modelling and Conformal Prediction

Probing the origin of dihydrofolate reductase inhibition via proteochemometric modeling

Proteochemometric Modeling of the Interaction Space of Carbonic Anhydrase and its Inhibitors: An Assessment of Structure‐based and Sequence‐based Descriptors

Contact Info

Product

Resources

About