2022
DOI: 10.1101/2022.08.22.504484
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VIRify: an integrated detection, annotation and taxonomic classification pipeline using virus-specific protein profile hidden Markov models

Abstract: The study of viral communities has revealed the enormous diversity and impact these biological entities have on a range of different ecosystems. These observations have sparked widespread interest in developing computational strategies that support the comprehensive characterization of viral communities based on sequencing data. Here we introduce VIRify, a new computational pipeline designed to provide a user-friendly and accurate functional and taxonomic characterization of viral communities. VIRify identifie… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(12 citation statements)
references
References 79 publications
0
5
0
Order By: Relevance
“…We also compare KMCP against established viral detection and annotation software PhaGCN ( Shang et al , 2021 ) and VIRify ( Rangel-Pineros et al , 2022 ) ( Supplementary Section S1.4 ). The taxon identification accuracies at the family rank (the lowest common supported rank by the three tools) showed that all tools had a precision of 1, while KMCP had the highest average recall (0.971), followed by PhaGCN (0.600) and VIRify (0.543).…”
Section: Resultsmentioning
confidence: 99%
“…We also compare KMCP against established viral detection and annotation software PhaGCN ( Shang et al , 2021 ) and VIRify ( Rangel-Pineros et al , 2022 ) ( Supplementary Section S1.4 ). The taxon identification accuracies at the family rank (the lowest common supported rank by the three tools) showed that all tools had a precision of 1, while KMCP had the highest average recall (0.971), followed by PhaGCN (0.600) and VIRify (0.543).…”
Section: Resultsmentioning
confidence: 99%
“…We presented here a novel strategy that exploits Viral Like Particle (VLP) enriched viromes as the first crucial step in a multiple-step pipeline for the identification and discovery of potentially new viral sequences. Compared to current methods to label as viral contigs of metagenomic origin 19,24,74 , our approach maximizes the chances of identifying phages with no common characteristics in terms of sequence, structure, or derived features with known phages. This is derived from our specific focus on the metagenomic assemblies from carefully evaluated highly enriched viromes, under the assumption that they would yield more sequences of sure viral origin.…”
Section: Discussionmentioning
confidence: 99%
“…Several collections of profile HMMs are publicly available, including the more generic Pfam database [ 24 ], which is oriented towards protein families, and domain-specific resources, such as VFam [ 54 ], pVOGs [ 53 ], efam [ 58 ], PHROGs [ 59 ], and ViPhOGs [ 60 ], more oriented to the virology community. With the exception of ViPhOGs and VIRify [ 61 ], their accompanying taxonomic classification pipeline, the available models are not provided with cutoff scores, requiring the user to arbitrarily draw the line between significant and non-significant results. Not using cutoff scores at all means that unrelated sequences could be detected.…”
Section: Discussionmentioning
confidence: 99%
“…However, these databases have not been updated in recent years and present some limitations, such as the highly biased representation of the different viral families and the low number of sequences used to build most of the models [ 34 ]. Some recently developed resources include RVDB-prot [ 55 ], a protein version of the Reference Viral DataBase (RVDB) [ 56 ], viralOGs/eggNOG v5.0 [ 57 ], efam [ 58 ], PHROGs [ 59 ], ViPhOGs [ 60 , 61 ], Cenote-Taker 2 hallmark gene HMM database [ 62 ], and IMG/VR [ 39 ], a database of viral genome sequences of cultivated and uncultivated viruses that includes thousands of profile HMMs. Most if not all available viral databases provide profile HMMs derived from MSAs containing orthologs, that is, sequences that are assumed to share a common ancestry and biological function.…”
Section: Introductionmentioning
confidence: 99%