A topological substructural approach to molecular design (TOSS-MODE) has been introduced for the selection and design of anticancer compounds. A quantitative model that discriminates anticancer compounds from the inactive ones in a training series was obtained. This model permits the correct classification of 91.43% of compounds in an external prediction set with only 1.43% of false actives and 7. 14% of false inactives. The model developed is then used in a simulation of a virtual search for Ras FTase inhibitors; 87% of the Ras FTase inhibitors used in this simulated search were correctly classified, thus indicating the ability of the TOSS-MODE model of finding lead compounds with novel structures and mechanism of action. Finally, a series of carbonucleosides was designed, and the compounds were classified as active/inactive anticancer compounds by using the model developed here. From the compounds so-designed, 20 were synthesized and evaluated experimentally for their antitumor effects on the proliferation of murine leukemia cells (L1210/0) and human T-lymphocyte cells (Molt4/C8 and CEM/0); 80% of these compounds were well-classified, as active or inactive, and only two pairs of isomeric compounds were false actives. The chloropurine derivatives were the most active compounds, especially compounds 6c, d.
Variable selection is a procedure used to select the most important features to obtain as much information as possible from a reduced amount of features. The selection stage is crucial. The subsequent design of a quantitative structure-activity relationship (QSAR) model (regression or discriminant) would lead to poor performance if little significant features are selected. In drug design modern era, by the means of combinatorial chemistry and high throughput screening, an unprecedented amount of experimental information has been generated. In addition, many molecular descriptors have been defined in the last two decays. All this information can be analyzed by QSAR techniques using adequate statistical procedures. These techniques and procedures should be fast, automated, and applicable to large data sets of structurally diverse compounds. For that reason, the identification of the best one seems to be a very difficult task in view of the large variable selection techniques existing nowadays. The intention of this review is to summarize some of the present knowledge concerning to variable selection methods applied to some well-known statistical techniques such as linear regression, PLS, kNN, Artificial Neural Networks, etc, with the aim to disseminate the advances of this important stage of the QSAR building model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.