We used a simple LRA-based on the maximal similarity (or minimal distance) scores of the two top ranking sequence classes. The scoring methods (Smith-Waterman, BLAST, local alignment kernel and compression based distances) were compared on datasets designed to test sequence similarities between proteins distantly related in terms of structure or evolution. It was found that LRA-based scoring can significantly outperform simple scoring methods.
The emerging role of internal dynamics in protein fold and function requires new avenues of structure analysis. We analyzed the dynamically restrained conformational ensemble of ubiquitin generated from residual dipolar coupling data, in terms of protruding and buried atoms as well as interatomic distances, using four proximity-based algorithms, CX, DPX, PRIDE and PRIDE-NMR (http://hydra.icgeb.trieste.it/protein/). We found that Ubiquitin, this relatively rigid molecule has a highly diverse dynamic ensemble. The environment of protruding atoms is highly variable across conformers, on the other hand, only a part of buried atoms tends to fluctuate. The variability of the ensemble cautions against the use of single conformers when explaining functional phenomena. We also give a detailed evaluation of PRIDE-NMR on a wide dataset and discuss its usage in the light of the features of available NMR distance restraint sets in public databases.
SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al., Nucl. Acids. Res. 21:3111-3115, 1992. The current approach uses a curated collection of domain sequences -the SBASE domain library -and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.
Fragment-based descriptions of protein structures have been successfully used in fast comparison methods for protein structures mining and classification. These descriptions reduce the dimensionality problem and allow the application of sequence alignment techniques in structural comparison of proteins. Most of fragment-based alphabets were derived from secondary structure H-bond patterns or from local substructures clustering. Both approaches have shown promising results in protein mining and classification, though their accuracy is still lower than the accurate geometrical methods.In this paper, we describe two original H-bond based alphabets, HB A and HB B, obtained from clustering experimentally determined protein structures. We compare these two new H-bond based alphabets with two secondary-structure (DSSP and STR) based and two backbone-fragment (KL and Q16) based alphabets. Information theory analysis showed that the information content is proportional to the size and coverage of the structural alphabets and that the alphabets of the same classes are more similar between themselves. Amino acid sequences shared more information with the new H-bond based alphabets than with other alphabets, though they presented the lowest mutual information with the sequences of secondary-structure based alphabets. The comparison of alignments obtained using the Smith-Waterman algorithm showed that similar classes have similar alignments and that the most dissimilar alignments of alphabets of a same class were those from HB A and HB B. H-bond based alphabets presented the best performances for protein classification using First-Nearest Neighbor and three different similarity measures: the scores of alignments obtained from the Smith-Waterman algorithm; the inner products of the normalized vectors from N-GRAM (N=1, 2, 3 and 4) decomposition of the sequences; and the probabilities of belonging to the Hidden Markov Model of every training group of the dataset. In addition, we showed that using First-Nearest Neighbor and the Log-Likelihood Ratio index of Needleman-Wunsch algorithm scores, the H-bond based alphabets presented performances very close to DALI for structure based protein classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.