Drug repurposing is a valuable tool for combating the slowing rates of novel therapeutic discovery. The Computational Analysis of Novel Drug Opportunities (CANDO) platform performs shotgun repurposing of 2030 indications/diseases using 3733 drugs/compounds to predict interactions with 46,784 proteins and relating them via proteomic interaction signatures. The accuracy is calculated by comparing interaction similarities of drugs approved for the same indications. We performed a unique subset analysis by breaking down the full protein library into smaller subsets and then recombining the best performing subsets into larger supersets. Up to 14% improvement in accuracy is seen upon benchmarking the supersets, representing a 100–1000-fold reduction in the number of proteins considered relative to the full library. Further analysis revealed that libraries comprised of proteins with more equitably diverse ligand interactions are important for describing compound behavior. Using one of these libraries to generate putative drug candidates against malaria, tuberculosis, and large cell carcinoma results in more drugs that could be validated in the biomedical literature compared to using those suggested by the full protein library. Our work elucidates the role of particular protein subsets and corresponding ligand interactions that play a role in drug repurposing, with implications for drug design and machine learning approaches to improve the CANDO platform.
Background: Drug discovery is an arduous process that requires many years and billions of dollars before approval for patient use. However, there are a number of drugs and human ingestibles approved for a variety of indications/diseases that can be potentially repurposed as new treatments for others, decreasing the time and cost required. Methods: CANDO (Computational Analysis of Novel Drug Opportunities) is a platform for shotgun, multitarget drug discovery and repurposing. The CANDO platform scores interactions between 46,784 proteins structures and 3,733 human use compounds using a bioinformatic docking protocol to generate compound-proteome interaction signatures that are then compared to identify candidates for repurposing. Benchmarking of the platform is accomplished by comparing the compound-proteome interaction signatures and determining whether signatures corresponding to pairs of drugs approved for the same indication fall within particular cutoffs. Results: We have altered the scoring function of bioinformatic docking protocol in the newest version of our platform (v1.5) to use the best OBscore for each compound-protein interaction, resulting in an increased benchmarking accuracy from 11.7% in v1 to 12.8% in v1.5 for the top10 cutoff, the most stringent one used, and correspondingly from 24.9% to 31.2% for the top100 cutoff. Conclusions: The change in the interaction scoring and other bug fixes in CANDO v1.5 have resulted in improved benchmarking performance, making the platform more effective at predicting novel, therapeutic drug-indication pairs.
Motivation: Elucidating drug-protein interactions is essential for understanding the beneficial effects of small molecule therapeutics in human disease states. Common drug discovery methods focus on optimizing the efficacy of a drug against a single biological target of interest. However, evidence supports the multitarget theory, i.e., drugs work by exerting their therapeutic effects via interaction with multiple biological targets. Analyzing drug interactions with a library of proteins can provide further insight into disease systems while also allowing for prediction of putative therapeutics. Results: We present the CANDO Python package for analysis of drug-proteome and drug-disease relationships. This package allows for rapid drug similarity assessment, most notably via the bioinformatic docking protocol in which protein interactions can be quickly scored for thousands of compounds. The platform can be benchmarked through a variety of protocols to determine how well drugs are related to each other in terms of the indications/diseases for which they are approved. Drug predictions are generated through consensus scoring of the most similar compounds to drugs known to treat a particular indication. Availability: The CANDO Python package is available on GitHub at https://github.com/ram-compbio/CANDO, through the Conda Python package installer, and at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.