The present manuscript introduces, for the first time, a novel 3D-QSAR alignment free method (QuBiLS-MIDAS) based on tensor concepts through the use of the three-linear and four-linear algebraic forms as specific cases of n-linear maps. To this end, the k(th) three-tuple and four-tuple spatial-(dis)similarity matrices are defined, as tensors of order 3 and 4, respectively, to represent 3Dinformation among "three and four" atoms of the molecular structures. Several measures (multi-metrics) to establish (dis)-similarity relations among "three and four" atoms are discussed, as well as, normalization schemes proposed for the n-tuple spatial-(dis)similarity matrices based on the simple-stochastic and mutual probability algebraic transformations. To consider specific interactions among atoms, both for the global and local indices, n-tuple path and length cut-off constraints are introduced. This algebraic scaffold can also be seen as a generalization of the vector-matrix-vector multiplication procedure (which is a matrix representation of the traditional linear, quadratic and bilinear forms) for the calculation of molecular descriptors and is thus a new theoretical approach with a methodological contribution. A variability analysis based on Shannon's entropy reveals that the best distributions are achieved with the ternary and quaternary measures corresponding to the bond and dihedral angles. In addition, the proposed indices have superior entropy behavior than the descriptors calculated by other programs used in chemo-informatics studies, such as, DRAGON, PADEL, Mold2, and so on. A principal component analysis shows that the novel 3D n-tuple indices codify the same information captured by the DRAGON 3D-indices, as well as, information not codified by the latter. A QSAR study to obtain deeper criteria on the contribution of the novel molecular parameters was performed for the binding affinity to the corticosteroid-binding globulin, using Cramer's steroid database. The achieved results reveal superior statistical parameters for the Bond Angle and Dihedral Angle approaches, consistent with the results obtained in variability analysis. Finally, the obtained QuBiLS-MIDAS models yield superior performances than all 3D-QSAR methods reported in the literature using the 31 steroids as training set, and for the popular division of Cramer's database in training (1-21) and test (22-31) sets, comparable to superior results in the prediction of the activity of the steroids are obtained. From the results achieved, it can be suggested that the proposed QuBiLS-MIDAS N-tuples indices are a useful tool to be considered in chemo-informatics studies.
Trichomonas vaginalis (Tv) is the causative agent of the most common, non-viral, sexually transmitted disease in women and men worldwide. Since 1959, metronidazole (MTZ) has been the drug of choice in the systemic treatment of trichomoniasis. However, resistance to MTZ in some patients and the great cost associated with the development of new trichomonacidals make necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, bond-based linear indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis were used to discover novel trichomonacidal chemicals. The obtained models, using non-stochastic and stochastic indices, are able to classify correctly 89.01% (87.50%) and 82.42% (84.38%) of the chemicals in the training (test) sets, respectively. These results validate the models for their use in the ligand-based virtual screening. In addition, they show large Matthews' correlation coefficients (C) of 0.78 (0.71) and 0.65 (0.65) for the training (test) sets, correspondingly. The result of predictions on the 10% full-out cross-validation test also evidences the robustness of the obtained models. Later, both models are applied to the virtual screening of 12 compounds already proved against Tv. As a result, they correctly classify 10 out of 12 (83.33%) and 9 out of 12 (75.00%) of the chemicals, respectively; which is the most important criterion for validating the models. Besides, these classification functions are applied to a library of seven chemicals in order to find novel antitrichomonal agents. These compounds are synthesized and tested for in vitro activity against Tv. As a result, experimental observations approached to theoretical predictions, since it was obtained a correct classification of 85.71% (6 out of 7) of the chemicals. Moreover, out of the seven compounds that are screened, synthesized and biologically assayed, six compounds (VA7-34, VA7-35, VA7-37, VA7-38, VA7-68, VA7-70) show pronounced cytocidal activity at the concentration of 100 mug/ml at 24 h (48 h) within the range of 98.66%-100% (99.40%-100%), while only two molecules (chemicals VA7-37 and VA7-38) show high cytocidal activity at the concentration of 10 mug/ml at 24 h (48 h): 98.38% (94.23%) and 97.59% (98.10%), correspondingly. The LDA-assisted QSAR models presented here could significantly reduce the number of synthesized and tested compounds and could increase the chance of finding new chemical entities with anti-trichomonal activity.
Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors such as molecular representation, mathematical method, algorithmical technique, and statistical distribution of data. For this reason, introduction and comparison of new methods are necessary in order to find the model that best fits the problem at hand. Earlier comparative studies report on Ward's algorithm using fingerprints for molecular description as generally superior in this field. However, problems still remain, i.e., other types of numerical descriptions have been little exploited, current descriptors selection strategy is trial and error-driven, and no previous comparative studies considering a broader domain of the combinatorial methods in grouping chemoinformatic data sets have been conducted. In this work, a comparison between combinatorial methods is performed,with five of them being novel in cheminformatics. The experiments are carried out using eight data sets that are well established and validated in the medical chemistry literature. Each drug data set was represented by real molecular descriptors selected by machine learning techniques, which are consistent with the neighborhood principle. Statistical analysis of the results demonstrates that pharmacological activities of the eight data sets can be modeled with a few of families with 2D and 3D molecular descriptors, avoiding classification problems associated with the presence of nonrelevant features. Three out of five of the proposed cluster algorithms show superior performance over most classical algorithms and are similar (or slightly superior in the most optimistic sense) to Ward's algorithm. The usefulness of these algorithms is also assessed in a comparative experiment to potent QSAR and machine learning classifiers, where they perform similarly in some cases.
Research on similarity searching of cheminformatic data sets has been focused on similarity measures using fingerprints. However, nominal scales are the least informative of all metric scales, increasing the tied similarity scores, and decreasing the effectivity of the retrieval engines. Tanimoto's coefficient has been claimed to be the most prominent measure for this task. Nevertheless, this field is far from being exhausted since the computer science no free lunch theorem predicts that "no similarity measure has overall superiority over the population of data sets". We introduce 12 relational agreement (RA) coefficients for seven metric scales, which are integrated within a group fusion-based similarity searching algorithm. These similarity measures are compared to a reference panel of 21 proximity quantifiers over 17 benchmark data sets (MUV), by using informative descriptors, a feature selection stage, a suitable performance metric, and powerful comparison tests. In this stage, RA coefficients perform favourably with repect to the state-of-the-art proximity measures. Afterward, the RA-based method outperform another four nearest neighbor searching algorithms over the same data domains. In a third validation stage, RA measures are successfully applied to the virtual screening of the NCI data set. Finally, we discuss a possible molecular interpretation for these similarity variants.
Bond-based quadratic indices, new TOMOCOMD-CARDD molecular descriptors, and linear discriminant analysis (LDA) were used to discover novel lead trichomonacidals. The obtained LDA-based quantitative structure-activity relationships (QSAR) models, using nonstochastic and stochastic indices, were able to classify correctly 87.91% (87.50%) and 89.01% (84.38%) of the chemicals in training (test) sets, respectively. They showed large Matthews correlation coefficients of 0.75 (0.71) and 0.78 (0.65) for the training (test) sets, correspondingly. Later, both models were applied to the virtual screening of 21 chemicals to find new lead antitrichomonal agents. Predictions agreed with experimental results to a great extent because a correct classification for both models of 95.24% (20 of 21) of the chemicals was obtained. Of the 21 compounds that were screened and synthesized, 2 molecules (chemicals G-1, UC-245) showed high to moderate cytocidal activity at the concentration of 10 mg/ml, another 2 compounds (G-0 and CRIS-148) showed high cytocidal activity only at the concentration of 100 mg/ml, and the remaining chemicals (from CRIS-105 to CRIS-153, except CRIS-148) were inactive at these assayed concentrations. Finally, the best candidate, G-1 (cytocidal activity of 100% at 10 mg/ml) was in vivo assayed in ovariectomized Wistar rats achieving promising results as a trichomonacidal drug-like compound. (Journal of Biomolecular Screening 2008:785-794).
In this report are used two data sets involving the main antidiabetic enzyme targets α‐amylase and α‐glucosidase. The prediction of α‐amylase and α‐glucosidase inhibitory activity as antidiabetic is carried out using LDA and classification trees (CT). A large data set of 640 compounds for α‐amylase and 1546 compounds in the case of α‐glucosidase are selected to develop the tree model. In the case of CT‐J48 have the better classification model performances for both targets with values above 80%–90% for the training and prediction sets, correspondingly. The best model shows an accuracy higher than 95% for training set; the model was also validated using 10‐fold cross‐validation procedure and through a test set achieving accuracy values of 85.32% and 86.80%, correspondingly. Additionally, the obtained model is compared with other approaches previously published in the international literature showing better results. Finally, we can say that the present results provided a double‐target approach for increasing the estimation of antidiabetic chemicals identification aimed by double‐way workflow in virtual screening pipelines.
Few years ago, the World Health Organization estimated the number of adults with trichomoniasis at 170 million worldwide, more than the combined numbers for gonorrhea, syphilis, and chlamydia. To combat this sexually transmitted disease, Metronidazole (MTZ) has emerged, since 1959, as a powerful drug for the systematic treatment of infected patients. However, increasing resistance to MTZ, adverse effects associated to high-dose MTZ therapies and very expensive conventional technologies related to the development of new trichomonacidals necessitate novel computational methods that shorten the drug discovery pipeline. Therefore, bond-based bilinear indices, new 2-D bond-based TOMOCOMD-CARDD Molecular Descriptors (MDs), and Linear Discriminant Analysis (LDA) are combined to discover novel antitrichomonal agents. Generated models, using non-stochastic and stochastic indices, are able to classify correctly the 90.11% (93.75%) and the 87.92% (87.50%) of chemicals in the training (test) sets, respectively. In addition, they show large Matthews correlation coefficients (C) of 0.80 (0.86) and 0.76 (0.71) for the training (test) sets, respectively. The result of predictions on the 10% full-out cross-validation test also evidences the quality of both models. In order to test the models predictive power, 12 compounds, already proved against Trichomonas vaginalis (Tv), are screened in a simulated virtual screening experiment. As a result, they correctly classified 9 out of 12 (75.00%) and 10 out of 12 (83.33%) of the chemicals, respectively, which were the most important criteria to validate the models. Finally, in order to prove the reach of TOMOCOMD-CARDD approach and to discover new trichomonacidals, these classification functions were applied to a set of QSAR Comb. Sci. 28, 2009, No. 1, 9 -26 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 9 * To whom correspondence should be addressed (contact for chem-and bioinformatics methods) ** Contact for biological assays *** Contact for chemical methods Full Paperseight chemicals which, in turn, were synthesized and tested toward in vitro activity against Tv. As a result, experimental observations confirm theoretical predictions to a great extent, since it is gained a correct classification of 87.50% (7/8) of chemicals. Biological tests also show several candidates as antitrichomonals, since almost all the compounds [VAM2-(3 -8)] exhibit pronounced cytocidal activities of 100% at the concentration of 100 mg/mL and at 24 h (48 h) but VAM2 -2: 99.37% (100%), and it is remarkable that these compounds do not show toxic activity in macrophage assays at this concentration. The Quantitative Structure -Activity Relationship (QSAR) models presented here could significantly reduce the number of synthesized and tested compounds as well as could act as virtual shortcuts to new chemical entities with trichomonacidal activity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.