2002
DOI: 10.1093/nar/30.7.1575
|View full text |Cite
|
Sign up to set email alerts
|

An efficient algorithm for large-scale detection of protein families

Abstract: Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
3,090
0
10

Year Published

2004
2004
2019
2019

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 3,237 publications
(3,109 citation statements)
references
References 49 publications
5
3,090
0
10
Order By: Relevance
“…log-transformed e-value, and MCL inflation of 2 63 ). vContact (https://bitbucket.org/MAVERICLab/ vcontact) was then used to calculate a similarity score between every pair of genome and/or contigs based on the number shared of PCs between the two sequences (as in 8,9 ), and then compute a MCL clustering of the genomes/contigs based on these similarity scores (thresholds of 1 on similarity score, MCL inflation of 2).…”
Section: Dataset Of Publicly Available Viral Genomes and Genome Fragmmentioning
confidence: 99%
“…log-transformed e-value, and MCL inflation of 2 63 ). vContact (https://bitbucket.org/MAVERICLab/ vcontact) was then used to calculate a similarity score between every pair of genome and/or contigs based on the number shared of PCs between the two sequences (as in 8,9 ), and then compute a MCL clustering of the genomes/contigs based on these similarity scores (thresholds of 1 on similarity score, MCL inflation of 2).…”
Section: Dataset Of Publicly Available Viral Genomes and Genome Fragmmentioning
confidence: 99%
“…These large proteins were further analyzed as potential class 2 Cas effectors. The potential effectors were clustered to protein families based on sequence similarities using MCL 46 . These protein families were expanded by building HMMs representing each of these families, and using them to search the metagenomic datasets for similar Cas proteins.…”
Section: Crispr-cas Computation Analysesmentioning
confidence: 99%
“…It specifically searches for clusters with a strongly interconnected neighborhood, which is often the case in large scale biological interaction networks. MGclus was compared to a selection of widely used clustering methods including MCL, 20 CFinder, 18 FastCommunity, 19 MCode, 7 MINE, 21 NEMO, 5 SPICi, 22 and Cohtop. 23 Three different benchmarks were conducted.…”
Section: Resultsmentioning
confidence: 99%