In order to minimize expensive drug failures, is essential to determine potential activity, toxicity and ADME problems as early as possible. In view of the large libraries of compounds now being handled by combinatorial chemistry and high-throughput screening, identification of potential drug is advisable even before synthesis using computational techniques such as QSAR modeling. A great number of in silico approaches to activity/toxicity prediction have been described in the literature, using molecular 0D, 1D, 2D and 3D descriptors. Also these descriptors have been implemented in available computational tools such as DRAGON, SYBYL and CODESSA for it easy use. However, many of them only have been used to explain a few prediction problems. This review attempts to summarize present knowledge related to the computational biological activity prediction based in 2D molecular descriptors implemented in the DRAGON software. These applications rely on new computational techniques such as virtual combinatorial synthesis, virtual computational screening or inverse. Several topological molecular descriptors applications are described, ranging from simple topological indices to topological indices derived from matrices weighted with atomic and bond properties. Their advantages, limitations and its possibilities in drug design are also discussed.
Variable selection is a procedure used to select the most important features to obtain as much information as possible from a reduced amount of features. The selection stage is crucial. The subsequent design of a quantitative structure-activity relationship (QSAR) model (regression or discriminant) would lead to poor performance if little significant features are selected. In drug design modern era, by the means of combinatorial chemistry and high throughput screening, an unprecedented amount of experimental information has been generated. In addition, many molecular descriptors have been defined in the last two decays. All this information can be analyzed by QSAR techniques using adequate statistical procedures. These techniques and procedures should be fast, automated, and applicable to large data sets of structurally diverse compounds. For that reason, the identification of the best one seems to be a very difficult task in view of the large variable selection techniques existing nowadays. The intention of this review is to summarize some of the present knowledge concerning to variable selection methods applied to some well-known statistical techniques such as linear regression, PLS, kNN, Artificial Neural Networks, etc, with the aim to disseminate the advances of this important stage of the QSAR building model.
A novel application of TOPological Substructural MOlecular DEsign (TOPS-MODE) was carried out in antibacterial drugs using computer-aided molecular design. Two series of compounds, one containing antibacterial and the other containing non-antibacterial compounds, were processed by a k-means cluster analysis in order to design training and predicting series. All clusters had a p-level < 0.005. Afterward, a linear classification function has been derived toward discrimination between antibacterial and non-antibacterial compounds. The model correctly classifies 94% of active and 86% of inactive compounds in the training series. More specifically, the model showed a global good classification of 91%, i.e., 263 cases out of 289. In predicting series, the model has shown overall predictabilities of 91 and 83% for active and inactive compounds, respectively. Thereby, the model has a global percentage of good classification of 89%. The TOPS-MODE approach, also, similarly compares with respect to one of the most useful models for antimicrobials selection reported to date.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.