The concept of multiple graph alignment (MGA) has recently been introduced as a novel method for the structural analysis of biomolecules. Using approximate graph matching techniques, this method enables the robust identification of approximately conserved patterns in biologically related structures. In particular, MGA enables the characterization of functional protein families independent of sequence or fold homology. This article first recalls the concept of MGA and then addresses the problem of computing optimal alignments from an algorithmic point of view. In this regard, a method from the field of evolutionary algorithms is proposed and empirically compared with a hitherto existing heuristic approach. Empirically, it is shown that the former yields significantly better results than the latter, albeit at the cost of an increased runtime.
Geometric objects are often represented approximately in terms of a finite set of points in threedimensional Euclidean space. In this paper, we extend this representation to what we call labeled point clouds. A labeled point cloud is a finite set of points, where each point is not only associated with a position in three-dimensional space, but also with a discrete class label that represents a specific property. This type of model is especially suitable for modeling biomolecules such as proteins and protein binding sites, where a label may represent an atom type or a physico-chemical property. Proceeding from this representation, we address the question of how to compare two labeled points clouds in terms of their similarity. Using fuzzy modeling techniques, we develop a suitable similarity measure as well as an efficient evolutionary algorithm to compute it. Moreover, we consider the problem of establishing an alignment of the structures in the sense of a one-to-one correspondence between their basic constituents. From a biological point of view, alignments of this kind are of great interest, since mutually corresponding molecular constituents offer important information about evolution and heredity, and can also serve as a means to explain a degree of similarity. In this paper, we therefore develop a method for computing pairwise or multiple alignments of labeled point clouds. To this end, we proceed from an optimal superposition of the corresponding point clouds and construct an alignment which is as much as possible in agreement with the neighborhood structure established by this superposition. We apply our methods to the structural analysis of protein binding sites.
To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.
Methods for comparing protein binding sites are frequently validated on data sets of pockets that were obtained simply by extracting the protein area next to the bound ligands. With this strategy, any unoccupied pocket will remain unconsidered. Furthermore, a large amount of ligand-biased intrinsic shape information is predefined, inclining the subsequent comparisons as rather trivial even in data sets that hardly contain redundancies in sequence information. In this study, we present the results of a very simplistic and shape-biased comparison approach, which stress that unrestricted cavity extraction is essential to enable unexpected cross-reactivity predictions among proteins and function annotations of orphan proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.