Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.
Objectives: For rheumatoid arthritis (RA) patients failing to achieve treatment targets with conventional synthetic disease-modifying antirheumatic drugs, tumor necrosis factor (TNF)-a inhibitors (anti-TNF therapies) are the primary first-line biologic therapy. In a cross-cohort, cross-platform study, we developed a molecular test that predicts inadequate response to anti-TNF therapies in biologic-naive RA patients. Materials and Methods: To identify predictive biomarkers, we developed a comprehensive human interactome-a map of pairwise protein/protein interactions-and overlaid RA genomic information to generate a model of disease biology. Using this map of RA and machine learning, a predictive classification algorithm was developed that integrates clinical disease measures, whole-blood gene expression data, and diseaseassociated transcribed single-nucleotide polymorphisms to identify those individuals who will not achieve an ACR50 improvement in disease activity in response to anti-TNF therapy. Results: Data from two patient cohorts (n = 58 and n = 143) were used to build a drug response biomarker panel that predicts nonresponse to anti-TNF therapies in RA patients, before the start of treatment. In a validation cohort (n = 175), the drug response biomarker panel identified nonresponders with a positive predictive value of 89.7 and specificity of 86.8. Conclusions: Across gene expression platforms and patient cohorts, this drug response biomarker panel stratifies biologic-naive RA patients into subgroups based on their probability to respond or not respond to anti-TNF therapies. Clinical implementation of this predictive classification algorithm could direct nonresponder patients to alternative targeted therapies, thus reducing treatment regimens involving multiple trial and error attempts of anti-TNF drugs.
Untargeted metabolomics can detect hundreds of compounds in food, yet without standards, it cannot quantify them. Here we show that we can take advantage of the universal scaling of nutrient concentrations to estimate the concentration of all chemicals detected by untargeted metabolomics. We validate our method on 20 raw foods, finding an excellent agreement between the predicted and the experimentally observed concentrations.
Given the important role food plays in health and wellbeing, the past decades have seen considerable experimental efforts dedicated to mapping the chemical composition of food ingredients. As the composition of raw food is genetically predetermined, here we ask, to what degree can we rely on genomics to predict the chemical composition of natural ingredients. We therefore developed tools to unveil the chemical composition of 75 edible plants genomes, finding that genome-based annotations increase the number of compounds linked to specific plants by 42 to 100%. We rely on Gibbs free energy to identify compounds that accumulate in plants, i.e., those that are more likely to be detected experimentally. To quantify the accuracy of our predictions, we performed untargeted metabolomics on 13 plants, allowing us to experimentally confirm the detectability of the predicted compounds. For example, we find 59 novel compounds in corn, predicted by genomics annotations and supported by our experiments, but previously not assigned to the plant. Our study shows that genome-based annotations can lead to an integrated metabologenomics platform capable of unveiling the chemical composition of edible plants, and the biochemical pathways responsible for the observed compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.