Current drug discovery and development approaches rely extensively on the identification and validation of appropriate targets; for example, those with marketable and robust therapeutics. Wide-ranging efforts have been directed at this problem and various approaches have been developed to identify disease-associated genes as candidates. In this work, we show with statistical significance that successful drug targets, in addition to their linkage to disease, share common characteristics that are disease-independent. For example, marked differences in functional category, tissue specificity, and sequence variability are observed between known targets and average proteins. These results lead to an interesting hypothesis: potentially good drug targets shall have some desired properties, which we refer to as "drug target-likeness" that are beyond their disease-associations. Because of the limited availability of comprehensive protein characteristics data, we tried to learn the drug target-likeness property at the sequence level. Results show that a support vector machine model is able to accurately distinguish targets from nontargets entirely with sequence features. It is our hope that these encouraging results will invite future systematic proteomic scale experiments to gather necessary protein characteristics data for the accurate and predictive definition of "drug target-likeness", providing a new perspective toward understanding and pursuing effective therapeutics.
The 37 molecular descriptors were selected using a hybrid filter/wrapper approach by combining a Fischer Score and Monte Carlo simulated annealing. Classification models for the acetylcholinesterase inhibitors were then built based on support vector machine (SVM), artificial neural networks (ANN), and k • nearest neighbor (k•NN) methods. For the 515 samples in the training set, we obtained average prediction accuracies of 87.3%-92.7% , 67.0%-81.0% , and 79.4%-88.2% for the positive, the negative, and the total samples, respectively, by 5 • fold cross validation. Average prediction accuracies of 72.7%-82.5% , 41.0%-53.0% , and 62.1%-69.1% were obtained for the positive, the negative, and the total samples, respectively, by the y•scrambling method, indicating that there was no chance correlation in our models. An external test was conducted on 172 samples that were not used for model building and we obtained prediction accuracies of 93.3%-100.0%, 74.6%-89.6%, and 86.1%-95.9% for the positive, the negative, and the total samples, respectively. The prediction accuracies obtained by all the machine learning methods especially by the SVM method were far better than previously reported results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.