SUPFAM: A database of sequence superfamilies of protein domains

Pandit, Shashi Bhushan; Bhadra, Rana; Gowri, Vijayendran; Balaji, S.; Anand, B.; Srinivasan, Narayanaswamy

doi:10.1186/1471-2105-5-28

Cited by 39 publications

(19 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In brief, to save computing resources, all the proteins were first coarsely pre-screened for a putative tetrad hydrogen bond network or a β-bulge, the two unique structural features in WD40 proteins 53 . Concerning that these criteria may have been too strict, we also included all the other potential WD40 proteins supported by the annotations of SMART 54 , Pfam 55 , PROSITE 56 , InterPro 57 , SUPFAM 58 , and Gene3D 59 . The protein sequences that passed the pre-screening were submitted to the WDSP program, which predicted the locations of each repeat, the hydrogen bond network positions, and the confidence scores of the predictions were also provided.…”

Section: Methodsmentioning

confidence: 99%

Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study

Wang

et al. 2017

Sci Rep

View full text Add to dashboard Cite

As an ancient protein family, the WD40 repeat proteins often play essential roles in fundamental cellular processes in eukaryotes. Although investigations of eukaryotic WD40 proteins have been frequently reported, prokaryotic ones remain largely uncharacterized. In this paper, we report a systematic analysis of prokaryotic WD40 proteins and detailed comparisons with eukaryotic ones. About 4,000 prokaryotic WD40 proteins have been identified, accounting for 6.5% of all WD40s. While their abundances are less than 0.1% in most prokaryotes, they are enriched in certain species from Cyanobacteria and Planctomycetes, and participate in various functions such as prokaryotic signal transduction and nutrient synthesis. Comparisons show that a higher proportion of prokaryotic WD40s tend to contain multiple WD40 domains and a large number of hydrogen bond networks. The observation that prokaryotic WD40 proteins tend to show high internal sequence identity suggests that a substantial proportion of them (~20%) should be formed by recent or young repeat duplication events. Further studies demonstrate that the very young WD40 proteins, i.e., Highly-Repetitive WD40s, should be of higher stability. Our results have presented a catalogue of prokaryotic WD40 proteins, and have shed light on their evolutionary origins.

show abstract

Section: Methodsmentioning

confidence: 99%

Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study

Wang

et al. 2017

Sci Rep

View full text Add to dashboard Cite

show abstract

“…(23), and our own predictions made by scanning all yeast genes with the HMMs from the Pfam (24), SMART (25) and SUPFAM (26) databases that correspond to the Weirauch DBD set (23). Similarly, we populated the database with motifs using several approaches.…”

Section: Generation Of the Databasementioning

confidence: 99%

YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities

Boer

Hughes

2011

Nucleic Acids Research

193

195

View full text Add to dashboard Cite

The yeast Saccharomyces cerevisiae is a prevalent system for the analysis of transcriptional networks. As a result, multiple DNA-binding sequence specificities (motifs) have been derived for most yeast transcription factors (TFs). However, motifs from different studies are often inconsistent with each other, making subsequent analyses complicated and confusing. Here, we have created YeTFaSCo (The Yeast Transcription Factor Specificity Compendium, http://yetfasco.ccbr.utoronto.ca/), an extensive collection of S. cerevisiae TF specificities. YeTFaSCo differs from related databases by being more comprehensive (including 1709 motifs for 256 proteins or protein complexes), and by evaluating the motifs using multiple objective quality metrics. The metrics include correlation between motif matches and ChIP-chip data, gene expression patterns, and GO terms, as well as motif agreement between different studies. YeTFaSCo also features an index of ‘expert-curated’ motifs, each associated with a confidence assessment. In addition, the database website features tools for motif analysis, including a sequence scanning function and precomputed genome-browser tracks of motif occurrences across the entire yeast genome. Users can also search the database for motifs that are similar to a query motif.

show abstract

“…There is no overlap between training and testing datasets. For the annotation information, we extracted the 8 types UniProt annotations of 'Subcellular localization (SL) [28]' and FDAs of 'GO [29]' , 'Pfam [30]', 'Smart [31]', 'PROSITE [32]', 'SUPFAM [33]', 'InterPro [34]', and 'PRINTS [35]' for all the proteins in the datasets. SL was reorganized by the UniProt build-in hierarchical subcellular localization table.…”

Section: Benchmark Datasetmentioning

confidence: 99%

Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Qiu

Xiao

et al. 2019

View full text Add to dashboard Cite

Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms. Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites. Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization. Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available. Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX.

show abstract

SUPFAM: A database of sequence superfamilies of protein domains

Cited by 39 publications

References 17 publications

Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study

Prokaryotic and Highly-Repetitive WD40 Proteins: A Systematic Study

YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities

Computational Prediction of Ubiquitination Proteins Using Evolutionary Profiles and Functional Domain Annotation

Contact Info

Product

Resources

About