2022
DOI: 10.1101/2022.07.28.501908
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

iPHoP: an integrated machine-learning framework to maximize host prediction for metagenome-assembled virus genomes

Abstract: The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived genomes lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in reca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
53
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(67 citation statements)
references
References 78 publications
0
53
0
Order By: Relevance
“… (F) Phanta’s abundance estimates for Bifidobacterium and predicted Bifidobacterium phages in bulk metagenomes from infants in the four-month cohort (who had a range of diets). This analysis was facilitated by one of Phanta’s provided post-processing scripts, along with host genus predictions that were made by iPHoP 65 and are provided in Phanta’s default database.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“… (F) Phanta’s abundance estimates for Bifidobacterium and predicted Bifidobacterium phages in bulk metagenomes from infants in the four-month cohort (who had a range of diets). This analysis was facilitated by one of Phanta’s provided post-processing scripts, along with host genus predictions that were made by iPHoP 65 and are provided in Phanta’s default database.…”
Section: Resultsmentioning
confidence: 99%
“… (E) Distribution of predicted host genera for viral species in various prevalence categories (e.g., category 75-100 represents the top 25% of viruses in terms of prevalence). These results are based on host genus predictions that were made using iPHoP 65 and are provided in Phanta’s default database.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This analysis required us to link phages to host bacteria, which we did primarily via CRISPR-Cas spacer targeting. This has been done many times previously [26][27][28][29] and is believed to be generally robust given that the spacers in a CRISPR locus of a host bacterial genome derive directly from phage genomes 28,43 . Supporting these CRISPR spacer-based links are several lines of evidence.…”
Section: Discussionmentioning
confidence: 99%
“…As CRISPR spacers are fragments of phage genomes stored within CRISPR-Cas systems, a common technique used to link phages to their bacterial hosts is via spacer-phage matching [26][27][28][29] . To find CRISPR-Cas systems encoded within SGA bacteria, we began with a previously compiled database that contained 862 genomes from the Saccharibacteria, Gracilibacteria, and Absconditabacteria (SGA) lineages 30 (Supp.…”
Section: Crispr-cas Systems Within Saccharibacteria Gracilibacteria A...mentioning
confidence: 99%