We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.
No abstract
In chemoinformatics, searching for compounds which are structurally diverse and share a biological activity is called scaffold hopping. Scaffold hopping is important since it can be used to obtain alternative structures when the compound under development has unexpected side-effects. Pharmaceutical companies use scaffold hopping when they wish to circumvent prior patents for targets of interest. We propose a new method for scaffold hopping using inductive logic programming (ILP). ILP uses the observed spatial relationships between pharmacophore types in pretested active and inactive compounds and learns human-readable rules describing the diverse structures of active compounds. The ILP-based scaffold hopping method is compared to two previous algorithms (chemically advanced template search, CATS, and CATS3D) on 10 data sets with diverse scaffolds. The comparison shows that the ILP-based method is significantly better than random selection while the other two algorithms are not. In addition, the ILP-based method retrieves new active scaffolds which were not found by CATS and CATS3D. The results show that the ILP-based method is at least as good as the other methods in this study. ILP produces human-readable rules, which makes it possible to identify the three-dimensional features that lead to scaffold hopping. A minor variant of a rule learnt by ILP for scaffold hopping was subsequently found to cover an inhibitor identified by an independent study. This provides a successful result in a blind trial of the effectiveness of ILP to generate rules for scaffold hopping. We conclude that ILP provides a valuable new approach for scaffold hopping.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.