Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. Gnina, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of Gnina under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina.
One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard dataset of sufficient size to compare performance between models. We present a new dataset for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank and perform a comprehensive evaluation of grid-based convolutional neural network models on this dataset. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind dataset, how performance improves by adding more, lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of 5 densely connected convolutional newtworks, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized dataset for training machine learning models to recognize ligands in non-cognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this dataset for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models. File list (2) download file view on ChemRxiv crossdocked2020.pdf (3.80 MiB) download file view on ChemRxiv crossdocked2020_supplement.pdf (0.92 MiB)
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic...
Obesity is known to decrease the efficacy of neoadjuvant chemotherapy (NAC) against breast cancer; however, the relationship between actual body composition and NAC outcomes remains unknown. Therefore, we determined the effect of body composition on NAC outcomes. A total of 172 advanced breast cancer patients who underwent surgery after NAC were retrospectively analyzed. Body composition parameters including abdominal circumference (AC), subcutaneous fat area (SFA), visceral fat area (VFA), and skeletal muscle area (SMA) were calculated using computed tomography volume‐analyzing software. VFA/SFA ratio was used to evaluate visceral obesity. The associations of body composition parameters with pathological complete remission (pCR) and survival were analyzed. AC, SFA, and VFA were significantly correlated with body mass index (BMI) (all P < 0.05; r = 0.82, r = 0.71, and r = 0.78, respectively). AC, SFA, and VFA increased significantly and SMA decreased significantly after menopause (all P < 0.05). VFA/SFA ratio increased significantly after menopause, even though BMI remained unchanged. Body composition parameters were not associated with pCR. Distant disease‐free survival (DDFS) was significantly worse in the high VFA group than in the low VFA group (P < 0.05). Furthermore, in the high VFA group, postmenopausal patients had significantly shorter DDFS than premenopausal patients (P < 0.05). VFA was independently associated with DDFS in the multivariate analysis (P < 0.05). High visceral fat is associated with worse NAC outcomes in breast cancer patients, especially postmenopausal patients. Interventions targeting visceral fat accumulation will likely improve NAC outcomes.
We identified two novel naturally occurring mutations (W74L and L77R) in the small S envelope protein of hepatitis B virus (HBV). Mutation L77R alone resulted in >10-fold-reduced secretion of virions. In addition, the 2.8-fold reduction of the extracellular HBV surface antigen (HBsAg) of mutant L77R from transfected Huh7 cells appeared to be correlated with a 1.7-fold reduction of intracellular HBsAg, as measured by enzyme-linked immunosorbent assay (ELISA). Surprisingly, opposite to the ELISA results, Western blot analysis revealed a near-10-fold-increased level of the intracellular mutant small S envelope protein. The discrepancy between ELISA and Western blot data was due to significant accumulation of the mutant L77R HBsAg in the intracellular pellet fraction. In contrast to HBsAg, the secretion of HBeAg was normal in L77R-transfected cells. The wild-type HBsAg was usually more diffuse and evenly distributed in the cytoplasm, often outside the perinuclear endoplasmic reticulum (ER) and Golgi apparatus, as observed by immunofluorescence assay. In contrast, the L77R mutant HBsAg tends to be highly restricted within the ER and Golgi, often accumulated in the Golgi compartments distal from the nucleus. The almost exclusive retention in the ER-Golgi of L77R HBsAg was similar to what was observed when the large envelope protein was overexpressed. These multiple aberrant phenotypes of mutant L77R can be corrected by a second naturally occurring S envelope mutation, W74L. Despite the accumulation of L77R HBsAg in ER-Golgi of transfected Huh7 cells, we detected no increase in Grp78 mRNA and proteins, which are common markers for ER stress response. Hepatitis B virus (HBV) is a major human pathogen.Chronic infection with HBV leads to the development of cirrhosis and hepatocellular carcinoma (2,16,36). HBV variants are often found in chronically infected patients (19,37). The most common naturally occurring mutation in human HBV core protein is at amino acid (aa) 97, changing a highly conserved isoleucine (HBsAg subtype adr) or phenylalanine (HBsAg subtype ayw) to a leucine (L) (3,(12)(13)(14)(15)20). In contrast to the established dogma of preferential virion secretion of mature genome for wild-type (WT) hepadnaviruses (17,33,40,44,47,48), the 97L mutation results in secretion of virions containing an immature genome into the medium and is characterized by excessive amounts of minus-strand DNA (47, 48). Even though the immature secretion phenotype has been observed with woodchuck and snowgoose hepadnaviruses (7, 42), it has not been reported with human patients. This may be due to the presence of naturally occurring compensatory mutations for 97L in the core protein at positions 5 (11) or 130 (49), both changing a highly conserved proline to threonine.HBV surface antigens (HBsAg) consist of three structurally related large (L), middle (M), and small (S) envelope proteins. These proteins share a common carboxyl terminus, with the L protein containing pre-S1, pre-S2, and small S domains, and the M envelope protein conta...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.