Abstract:Knowledge regarding the 3D structure of a protein provides useful information about the protein's functional properties. Particularly, structural similarity between proteins can be used as a good predictor of functional similarity. One method that uses the 3D geometrical structure of proteins in order to compare them is the similarity value (SV). In this paper, we introduce a new definition of the SV measure for comparing two proteins. To this end, we consider the mass of the protein's atoms and concentrate on… Show more
“…This issue requires further investigation. To limit the biases, it has been proposed to remove the amino acids corresponding to indels from structures before calculating distances ( 17 ) or to normalize distances ( 82 , 83 ). A key question is whether to include indels or leave them out, as most studies based on sequence comparison do.…”
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on three-dimensional structure comparisons are still in their infancy. In this study, we propose a new effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein three-dimensional protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g., classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from ten major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
“…This issue requires further investigation. To limit the biases, it has been proposed to remove the amino acids corresponding to indels from structures before calculating distances ( 17 ) or to normalize distances ( 82 , 83 ). A key question is whether to include indels or leave them out, as most studies based on sequence comparison do.…”
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on three-dimensional structure comparisons are still in their infancy. In this study, we propose a new effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein three-dimensional protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g., classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from ten major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Numerous proteins are molecular targets for drug action and hence are important in drug discovery. Structure-based computational drug discovery relies on detailed information regarding protein conformations for subsequent drug screening in silico. There are two key issues in analyzing protein conformations in virtual screening. The first considers the protein’s conformational change in response to physical and chemical conditions. The second is the protein’s atomic resolution reconstruction from X-ray crystallography or nuclear magnetic resonance (NMR) data. In this latter problem, information is needed regarding the sample’s position relative to the source of X-rays. Here, we introduce a new measure for classifying protein conformational states using spectral representation and Wigner’s D-functions. Predictions based on the new measure are in good agreement with conformational states of proteins. These results could also be applied to improve conformational alignment of the snapshots given by protein crystallography.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.