Describing the connectivity of chemical and/or biological systems using networks is a straight gate for the introduction of mathematical tools in proteomics. Networks, in some cases even very large ones, are simple objects that are composed at least by nodes and edges. The nodes represent the parts of the system and the edges geometric and/or functional relationships between parts. In proteomics, amino acids, proteins, electrophoresis spots, polypeptidic fragments, or more complex objects can play the role of nodes. All of these networks can be numerically described using the so-called Connectivity Indices (CIs). The transformation of graphs (a picture) into CIs (numbers) facilitates the manipulation of information and the search for structure-function relationships in Proteomics. In this work, we review and comment on the challenges and new trends in the definition and applications of CIs in Proteomics. Emphasis is placed on 1-D-CIs for DNA and protein sequences, 2-D-CIs for RNA secondary structures, 3-D-topographic indices (TPGIs) for protein function annotation without alignment, 2-D-CIs and 3-D-TPGIs for the study of drug-protein or drug-RNA quantitative structure-binding relationships, and pseudo 3-D-CIs for protein surface molecular recognition. We also focus on CIs to describe Protein Interaction Networks or RNA co-expression networks. 2-D-CIs for patient blood proteome 2-DE maps or mass spectra are also covered.
The numerical encoding of chemical structure with Topological Indices (TIs) is currently growing in importance in Medicinal Chemistry and Bioinformatics. This approach allows the rapid collection, annotation, retrieval, comparison and mining of chemical structures within large databases. TIs can subsequently be used to seek quantitative structure-activity relationships (QSAR), which are models connecting chemical structure with biological activity. In the early 1990's, there was an explosion in the introduction and definition of new TIs. The Handbook of Molecular Descriptors by Todeschini and Consonni lists more than 1500 of these indices. At the end of the last century, researchers produced a large number of TIs with essentially the same advantages and/or disadvantages. Consequently, many researchers abandoned the definition of TIs for a time. In our opinion, one of the problems associated with TIs is that researchers aimed their efforts only at the codification of chemical connectivity for small-sized drugs. As a consequence, recently it seems that we have arrived at "Fukuyama's End of History in TIs definition". In the work described here, we review and comment on the "quo vadis" and challenges in the definition of TIs as we enter the new century. Emphasis is placed on new chiral TIs (CTIs), flexible TIs for unifying QSAR models with multiple targets, topographic indices (TPGIs), TIs for DNA and protein sequences, TIs for 2D RNA structures, TPGIs and drug-protein or drug-RNA quantitative structure-binding relationship (QSBR) studies, TIs to encode protein surface information and TIs for protein interaction networks (PINs).
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (f k ). In this work, we calculated the f k values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG = 5.36 AE f 1 À 3.98 AE f 3 À 42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.
This work explores the potential of the MARCH-INSIDE methodology to seek a QSAR for MAO-A inhibitors from a heterogeneous series of compounds. A Markov model was used to quickly calculate the molecular electron delocalization, polarizability, refractivity, and n-octanol/water partition coefficients for a series of 1406 active/nonactive compounds. LDA was subsequently used to fit a classification function. The model showed 92.8% and 91.8% global accuracy and predictability in training and validation studies. This QSAR model was validated through a virtual screening of a series of coumarin derivatives. The 15 selected compounds were prepared and evaluated as in vitro MAO-A inhibitors. The theoretical prediction was compared with the experimental results and the model correctly predicted 13 compounds with only two mistakes on compounds with activities very close to the cutoff point established for the model. Consequently, this method represents a useful tool for the "in silico" screening of MAO-A inhibitors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.