2023
DOI: 10.3389/fmicb.2023.1078760
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of computational phage detection tools for metagenomic datasets

Abstract: IntroductionAs new computational tools for detecting phage in metagenomes are being rapidly developed, a critical need has emerged to develop systematic benchmarks.MethodsIn this study, we surveyed 19 metagenomic phage detection tools, 9 of which could be installed and run at scale. Those 9 tools were assessed on several benchmark challenges. Fragmented reference genomes are used to assess the effects of fragment length, low viral content, phage taxonomy, robustness to eukaryotic contamination, and computation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 73 publications
(68 reference statements)
1
9
0
Order By: Relevance
“…Notably, all four of these sequences were identified with the virus detection tools, VirFinder and DeepVirFinder, which are trained to identify prokaryotic viruses by k-mer based (deep)machine learning methods. Being homology-independent methods, they are known to have higher potential to identify novel sequences 37 . Among the 788 genomes that remained unaligned to the NCBI viral RefSeq database, 632 were complete genomes and 156 were >90% complete (Figure 1C).…”
Section: Resultsmentioning
confidence: 99%
“…Notably, all four of these sequences were identified with the virus detection tools, VirFinder and DeepVirFinder, which are trained to identify prokaryotic viruses by k-mer based (deep)machine learning methods. Being homology-independent methods, they are known to have higher potential to identify novel sequences 37 . Among the 788 genomes that remained unaligned to the NCBI viral RefSeq database, 632 were complete genomes and 156 were >90% complete (Figure 1C).…”
Section: Resultsmentioning
confidence: 99%
“…Another one benchmarked tools that specialize in identifying viruses from clinical samples 51 . Four benchmarking works 49,50,52,53 mainly used simulated viral and non-viral testing datasets that were sampled from publicly available complete viral and microbial genomes (e.g., NCBI RefSeq). A summary of the tested tools and testing datasets of each study can be found in Supplementary Table S2.…”
Section: Comparison To Other Existing Benchmarking Workmentioning
confidence: 99%
“…Several benchmarking studies have compared the performance of various virus identification tools [48][49][50][51][52][53] (Supplementary Table S2). Most of them used simulated sequencing data or sequencing data from mock community as testing datasets.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Current databases also show biases towards certain genera (Schackart III et al, 2023), which can skew benchmarking and the evaluation of different methods. To address this, we used a balanced benchmarking approach, ensuring each viral group corresponds to their predicted host genus, minimizing bias.…”
Section: Application Ii: Phage Sequence Analysismentioning
confidence: 99%