A promising protein target for computational drug development, the human cluster of differentiation 38 (CD38), plays a crucial role in many physiological and pathological processes, primarily through the upstream regulation of factors that control cytoplasmic Ca2+ concentrations. Recently, a small-molecule inhibitor of CD38 was shown to slow down pathways relating to aging and DNA damage. We examined the performance of seven docking programs for their ability to model protein-ligand interactions with CD38. A test set of twelve CD38 crystal structures, containing crystallized biologically relevant substrates, were used to assess pose prediction. The rankings for each program based on the median RMSD between the native and predicted were Vina, AD4 > PLANTS, Gold, Glide, Molegro > rDock. Forty-two compounds with known affinities were docked to assess the accuracy of the programs at affinity/ranking predictions. The rankings based on scoring power were: Vina, PLANTS > Glide, Gold > Molegro >> AutoDock 4 >> rDock. Out of the top four performing programs, Glide had the only scoring function that did not appear to show bias towards overpredicting the affinity of the ligand-based on its size. Factors that affect the reliability of pose prediction and scoring are discussed. General limitations and known biases of scoring functions are examined, aided in part by using molecular fingerprints and Random Forest classifiers. This machine learning approach may be used to systematically diagnose molecular features that are correlated with poor scoring accuracy.
The calculation of the anharmonic modes of small- to medium-sized molecules for assigning experimentally measured frequencies to the corresponding type of molecular motions is computationally challenging at sufficiently high levels of quantum chemical theory. Here, a practical and affordable way to calculate coupled-cluster quality anharmonic frequencies using second-order vibrational perturbation theory (VPT2) from machine-learned models is presented. The approach, referenced as “NN + VPT2”, uses a high-dimensional neural network (PhysNet) to learn potential energy surfaces (PESs) at different levels of theory from which harmonic and VPT2 frequencies can be efficiently determined. The NN + VPT2 approach is applied to eight small- to medium-sized molecules (H2CO, trans-HONO, HCOOH, CH3OH, CH3CHO, CH3NO2, CH3COOH, and CH3CONH2) and frequencies are reported from NN-learned models at the MP2/aug-cc-pVTZ, CCSD(T)/aug-cc-pVTZ, and CCSD(T)-F12/aug-cc-pVTZ-F12 levels of theory. For the largest molecules and at the highest levels of theory, transfer learning (TL) is used to determine the necessary full-dimensional, near-equilibrium PESs. Overall, NN + VPT2 yields anharmonic frequencies to within 20 cm–1 of experimentally determined frequencies for close to 90% of the modes for the highest quality PES available and to within 10 cm–1 for more than 60% of the modes. For the MP2 PESs only ∼60% of the NN + VPT2 frequencies were within 20 cm–1 of the experiment, with outliers up to ∼150 cm–1, compared to the experiment. It is also demonstrated that the approach allows to provide correct assignments for strongly interacting modes such as the OH bending and the OH torsional modes in formic acid monomer and the CO-stretch and OH-bend mode in acetic acid.
Platinum-based chemotherapy remains the cornerstone of treatment for most non-small cell lung cancer (NSCLC) cases either as maintenance therapy or in combination with immunotherapy. However, resistance remains a primary issue. Our findings point to the possibility of exploiting levels of cell division cycle associated protein-3 (CDCA3) to improve response of NSCLC tumours to therapy. We demonstrate that in patients and in vitro analyses, CDCA3 levels correlate with measures of genome instability and platinum sensitivity, whereby CDCA3high tumours are sensitive to cisplatin and carboplatin. In NSCLC, CDCA3 protein levels are regulated by the ubiquitin ligase APC/C and cofactor Cdh1. Here, we identified that the degradation of CDCA3 is modulated by activity of casein kinase 2 (CK2) which promotes an interaction between CDCA3 and Cdh1. Supporting this, pharmacological inhibition of CK2 with CX-4945 disrupts CDCA3 degradation, elevating CDCA3 levels and increasing sensitivity to platinum agents. We propose that combining CK2 inhibitors with platinum-based chemotherapy could enhance platinum efficacy in CDCA3low NSCLC tumours and benefit patients.
An essential aspect for adequate predictions of chemical properties by machine learning models is the database used for training them. However, studies that analyze how the content and structure of the databases used for training impact the prediction quality are scarce. In this work, we analyze and quantify the relationships learned by a machine learning model (Neural Network) trained on five different reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict tautomerization energies from molecules in Tautobase. For this, characteristics such as the number of heavy atoms in a molecule, number of atoms of a given element, bond composition, or initial geometry on the quality of the predictions are considered. The results indicate that training on a chemically diverse database is crucial for obtaining good results and also that conformational sampling can partly compensate for limited coverage of chemical diversity. The overall best-performing reference database (ANI-1x) performs on average by 1 kcal/mol better than PC9, which, however, contains about 2 orders of magnitude fewer reference structures. On the other hand, PC9 is chemically more diverse by a factor of ∼5 as quantified by the number of atom-in-molecule-based fragments (amons) it contains compared with the ANI family of databases. A quantitative measure for deficiencies is the Kullback–Leibler divergence between reference and target distributions. It is explicitly demonstrated that when certain types of bonds need to be covered in the target database (Tautobase) but are undersampled in the reference databases, the resulting predictions are poor. Examples of this include the poor performance of all databases analyzed to predict C(sp2)–C(sp2) double bonds close to heteroatoms and azoles containing N–N and N–O bonds. Analysis of the results with a Tree MAP algorithm provides deeper understanding of specific deficiencies in predicting tautomerization energies by the reference datasets due to inadequate coverage of chemical space. Capitalizing on this information can be used to either improve existing databases or generate new databases of sufficient diversity for a range of machine learning (ML) applications in chemistry.
Glycosaminoglycans (GAGs) are a family of anionic carbohydrates that play an essential role in the physiology and pathology of all eukaryotic life forms. Experimental determination of GAG–protein complexes is challenging due to their difficult isolation from biological sources, natural heterogeneity, and conformational flexibilityincluding possible ring puckering of sulfated iduronic acid from 1C4 to 2SO conformation. To overcome these challenges, we present GlycoTorch Vina (GTV), a molecular docking tool based on the carbohydrate docking program VinaCarb (VC). Our program is unique in that it contains parameters to model 2SO sugars while also supporting glycosidic linkages specific to GAGs. We discuss how crystallographic models of carbohydrates can be biased by the choice of refinement software and structural dictionaries. To overcome these variations, we carefully curated 12 of the best available GAG and GAG-like crystal structures (ranging from tetra- to octasaccharides or longer) obtained from the PDB-REDO server and refined using the same protocol. Both GTV and VC produced pose predictions with a mean root-mean-square deviation (RMSD) of 3.1 Å from the native crystal structurea statistically significant improvement when compared to AutoDock Vina (4.5 Å) and the commercial software Glide (5.9 Å). Examples of how real-space correlation coefficients can be used to better assess the accuracy of docking pose predictions are given. Comparisons between statistical distributions of empirical “salt bridge” interactions, relevant to GAGs, were compared to density functional theory (DFT) studies of model salt bridges, and water-mediated salt bridges; however, there was generally a poor agreement between these data. Water bridges appear to play an important, yet poorly understood, role in the structures of GAG–protein complexes. To aid in the rapid prototyping of future pose scoring functions, we include a module that allows users to include their own torsional and nonbonded parameters.
By drawing analogies from the dimerization of cyclopentadiene, a novel reaction pathway bifurcation is uncovered in the cycloaddition of oxidopyrylium ylides and butadiene. Analysis of the potential energy surface (at the M06-2X/6-311+G(d,p) level of theory) in combination with Born−Oppenheimer molecular dynamics simulations (M06-2X/6-31+G(d)) demonstrate that both the (4 + 3)-and (5 + 2)-cycloaddition products are accessed from the same transition state. Key indicators of a pathway bifurcation (asynchronous bond formation, and a second transition state for the interconversion of the products) are also observed. The absence of a post-transition state bifurcation in the related oxidopyridinium systems of Krenske and Harmata is rationalized. Finally, the isosymmetry of the oxidopyrylium and cyclopentadiene molecular orbitals as well as the presence of "secondary orbital interactions" are emphasized as the common source of nonstatistical behavior. Application of these principles will allow for the rapid identification of new reaction pathway bifurcations.
Vibrational spectroscopy in supersonic jet expansions is a powerful tool to assess molecular aggregates in close to ideal conditions for the benchmarking of quantum chemical approaches. The low temperatures achieved...
Glycosaminoglycan (GAG) mimetics are synthetic or semi-synthetic analogues of heparin or heparan sulfate, which are designed to interact with GAG binding sites on proteins. The preclinical stages of drug development rely on efficacy and toxicity assessment in animals and aim to apply these findings to clinical studies. However, such data may not always reflect the human situation possibly because the GAG binding site on the protein ligand in animals and humans could differ. Possible inter-species differences in the GAG-binding sites on antithrombin III, heparanase, and chemokines of the CCL and CXCL families were examined by sequence alignments, molecular modelling and assessment of surface electrostatic potentials to determine if one species of laboratory animal is likely to result in more clinically relevant data than another. For each protein, current understanding of GAG binding is reviewed from a protein structure and function perspective. This combinatorial analysis shows chemokine dimers and oligomers can present different GAG binding surfaces for the same target protein, whereas a cleft-like GAG binding site will differently influence the types of GAG structures that bind and the species preferable for preclinical work. Such analyses will allow an informed choice of animal(s) for preclinical studies of GAG mimetic drugs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.