2010
DOI: 10.1093/bioinformatics/btq003
|View full text |Cite
|
Sign up to set email alerts
|

CD-HIT Suite: a web server for clustering and comparing biological sequences

Abstract: Summary: CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
1,692
0
2

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 2,213 publications
(1,697 citation statements)
references
References 9 publications
3
1,692
0
2
Order By: Relevance
“…Assembled transcripts were aligned to our genome sequence using NCBI blastn v.2.2.30+ with an e-value cut-off of 1 × 10 −5 . Successfully aligned transcripts were clustered at 90% identity using CD-HIT (v. 4.5.4) 45 , with representative sequences from each cluster retained and used to help parameterize gene calling. Eighty-seven per cent of the trimmed RNA-seq reads aligned to the Oropetium genome, suggesting that the genome is largely complete (Supplementary Table 5).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Assembled transcripts were aligned to our genome sequence using NCBI blastn v.2.2.30+ with an e-value cut-off of 1 × 10 −5 . Successfully aligned transcripts were clustered at 90% identity using CD-HIT (v. 4.5.4) 45 , with representative sequences from each cluster retained and used to help parameterize gene calling. Eighty-seven per cent of the trimmed RNA-seq reads aligned to the Oropetium genome, suggesting that the genome is largely complete (Supplementary Table 5).…”
Section: Methodsmentioning
confidence: 99%
“…Aligned and representative sequences from our transcriptome assembly were input to Maker as expressed sequence tag evidence. Rice and Brachypodium proteome sequences clustered at 90% identity using CD-HIT (v. 4.5.4) 45 with representative sequences from each cluster retained and input to Maker as multi-organismal protein homology evidence. The Oropetium repeat database was input to Maker as a custom repeat library.…”
Section: Methodsmentioning
confidence: 99%
“…Twenty-nine vertebrate proteomes were downloaded from GenBank (28 proteomes) and Ensembl (1 proteome). The CD-HIT algorithm was used to exclude redundant sequences in these proteomes (Huang et al, 2010). The orthologous proteins were screened out using the best reciprocal hit BLAST (version 2.2.28+) (Altschul et al, 1997).…”
Section: Evaluation Of the Foxp Subfamily Duplication Process Using Mmentioning
confidence: 99%
“…In this regard, sequences with high genetic similarity (i.e. genetic distance < 5% (13) from each non-neighboring country were clustered using the Cluster Database at High Identity with Tolerance (CD-HIT) program (24), accessible via an online web server (25), and only one sequence per cluster was retained in each data set. Since the number of sequences from Iran and its neighboring countries was small, we did not reduce the number of these data.…”
Section: Hiv-1 Subtype B and Crf01_ae Data Setsmentioning
confidence: 99%