11The drug discovery process can be significantly improved through understanding how the 12 structure of chemical compounds relates to their function. A common paradigm that has been 13 used to filter and prioritize compounds is ligand-based virtual screening, where large libraries of 14 compounds are queried for high structural similarity to a target molecule, with the assumption 15 that structural similarity is predictive of similar biological activity. Although the chemical 16 informatics community has already proposed a wide range of structure descriptors and similarity 17 coefficients, a major challenge has been the lack of systematic and unbiased benchmarks for 18 biological activity that covers a broad range of targets to definitively assess the performance of 19 the alternative approaches. 20We leveraged a large set of chemical-genetic interaction data from the yeast Saccharomyces 21 cerevisiae that our labs have recently generated, covering more than 13,000 compounds from the 22 RIKEN NPDepo and several NCI, NIH, and GlaxoSmithKline (GSK) compound collections. 23Supportive of the idea that chemical-genetic interaction data provide an unbiased proxy for 24 biological functions, we found that many commonly used structural similarity measures were 25 able to predict the compounds that exhibited similar chemical-genetic interaction profiles, 26 although these measures did exhibit significant differences in performance. Using the chemical-27genetic interaction profiles as a basis for our evaluation, we performed a systematic 28 benchmarking of 10 different structure descriptors, each combined with 12 different similarity 29 coefficients. We found that the All-Shortest Path (ASP) structure descriptor paired with the 30 Braun-Blanquet similarity coefficient provided superior performance that was robust across 31 several different compound collections. 32We further describe a machine learning approach that improves the ability of the ASP metric to 33 capture biological activity. We used the ASP fingerprints as input for several supervised machine 34 learning models and the chemical-genetic interaction profiles as the standard for learning. We 35 found that the predictive power of the ASP fingerprints (as well as several other descriptors) 36 could be substantially improved by using support vector machines. For example, on held-out 37 data, we measured a 5-fold improvement in the recall of biologically similar compounds at a 38 precision of 50% based upon the ASP fingerprints. Our results generally suggest that using high-39 dimensional chemical-genetic data as a basis for refining chemical structure descriptors can be a 40 powerful approach to improving prediction of biological function from structure. 41
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.