2021
DOI: 10.1038/s41592-021-01101-x
|View full text |Cite
|
Sign up to set email alerts
|

Sensitive protein alignments at tree-of-life scale using DIAMOND

Abstract: We are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
1,196
1
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 1,487 publications
(1,234 citation statements)
references
References 30 publications
4
1,196
1
1
Order By: Relevance
“…Taxonomic binners group sequences into bins labelled with a taxonomic identifier. For taxonomic binning, we evaluated 547 results for nine methods and versions: LSHVec v.cami2 49 , PhyloPythiaS+ v.1.4 54 , Kraken v.2.0.8-beta 55 and v.0.10.5-beta (cami1), DIAMOND v.0.9.28 56 , MEGAN v.6.15.2 57 , Ganon v.0.1.4 and v.0.3.1 58 , and NBC++ 59 . Of these, 75 were for the marine, 405 for strain madness, and 67 for plant-associated data, on either reads or gold standard assemblies (Supplementary Tables 2).…”
Section: Taxonomic Binning Challengementioning
confidence: 99%
“…Taxonomic binners group sequences into bins labelled with a taxonomic identifier. For taxonomic binning, we evaluated 547 results for nine methods and versions: LSHVec v.cami2 49 , PhyloPythiaS+ v.1.4 54 , Kraken v.2.0.8-beta 55 and v.0.10.5-beta (cami1), DIAMOND v.0.9.28 56 , MEGAN v.6.15.2 57 , Ganon v.0.1.4 and v.0.3.1 58 , and NBC++ 59 . Of these, 75 were for the marine, 405 for strain madness, and 67 for plant-associated data, on either reads or gold standard assemblies (Supplementary Tables 2).…”
Section: Taxonomic Binning Challengementioning
confidence: 99%
“…Clean data were assembled using SOAPdenovo2 short sequence assembly software [ 58 , 59 ]. The software Augustus 3.2.1 [ 60 ] was used for gene model prediction, and all gene annotations were completed by the software DIAMOND v2.0.7 [ 61 ] against seven databases (SWISS-PROT [ 62 ], COG [ 63 ], GO [ 64 ], KEGG [ 65 ], PHI [ 66 ], CAZy [ 67 ], and NCBI NR). The genomes of three strains of orchid mycorrhizal fungi— Sebacina vermifera (Accession No.…”
Section: Methodsmentioning
confidence: 99%
“…The remaining reads were assembled into contigs by Trinityrnaseq [19]. The contigs longer than 500 bp were filtered and dereplicated (nucleotide identity > 95% and coverage rate > 80%) by CD-HIT [20] and aligned against the nonredundant protein database from NCBI (updated in March 2021) for taxonomic classification using Diamond blastX [21]. The BLASTX result was processed by the LCA algorithm (weighted LCA percent = 75%, e-value = 1 × 10 −5 , min Support = 1) with MEGAN [22].…”
Section: Methodsmentioning
confidence: 99%