Culture-independent approaches have recently shed light on the genomic diversity of viruses of prokaryotes. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), that uses scores to 43,644 protein clusters to assign hosts to complete or fragmented genomes of viruses of Archaea and Bacteria. RaFAH displayed performance comparable with that of other methods for virus-host prediction in three different benchmarks encompassing viruses from RefSeq, single amplified genomes, and metagenomes. RaFAH was applied to assembled metagenomic datasets of uncultured viruses from eight different biomes of medical, biotechnological, and environmental relevance. Our analyses led to the identification of 537 sequences of archaeal viruses representing unknown lineages, whose genomes encode novel auxiliary metabolic genes, shedding light on how these viruses interfere with the host molecular machinery.
The SAR11 clade is one of the most abundant bacterioplankton groups in surface waters of most of the oceans and lakes. However, only 15 SAR11 phages have been isolated thus far, and only one of them belongs to the Myoviridae family (pelagimyophages). Here, we have analyzed 26 sequences of myophages that putatively infect the SAR11 clade. They have been retrieved by mining ca. 45 Gbp aquatic assembled cellular metagenomes and viromes. Most of the myophages were obtained from the cellular fraction (0.2 μm), indicating a bias against this type of virus in viromes. We have found the first myophages that putatively infect Candidatus Fonsibacter (freshwater SAR11) and another group putatively infecting bathypelagic SAR11 phylogroup Ic. The genomes have similar sizes and maintain overall synteny in spite of low average nucleotide identity values, revealing high similarity to marine cyanomyophages. Pelagimyophages recruited metagenomic reads widely from several locations but always much more from cellular metagenomes than from viromes, opposite to what happens with pelagipodophages. Comparing the genomes resulted in the identification of a hypervariable island that is related to host recognition. Interestingly, some genes in these islands could be related to host cell wall synthesis and coinfection avoidance. A cluster of curli-related proteins was widespread among the genomes, although its function is unclear. IMPORTANCE SAR11 clade members are among the most abundant bacteria on Earth. Their study is complicated by their great diversity and difficulties in being grown and manipulated in the laboratory. On the other hand, and due to their extraordinary abundance, metagenomic data sets provide enormous richness of information about these microbes. Given the major role played by phages in the lifestyle and evolution of prokaryotic cells, the contribution of several new bacteriophage genomes preying on this clade opens windows into the infection strategies and life cycle of its viruses. Such strategies could provide models of attack of large-genome phages preying on streamlined aquatic microbes.
Vibrio vulnificus is an emergent marine pathogen and is the cause of a deadly septicemia. However, the genetic factors that differentiate its clinical and environmental strains and its several biotypes remain mostly enigmatic. In this work, we investigated the underlying genomic properties and population dynamics of the V. vulnificus species to elucidate the traits that make these strains emerge as a human pathogen. The acquisition of different ecological determinants could have allowed the development of highly divergent clusters with different lifestyles within the same environment. However, we identified strains from both clusters in the mucosa of aquaculture species, indicating that manmade niches are bringing strains from the two clusters together, posing a potential risk of recombination and of emergence of novel variants. We propose a new evolutionary model that provides a perspective that could be broadly applicable to other pathogenic vibrios and facultative bacterial pathogens to pursue strategies to prevent their infections.
Pathogen emergence is a complex phenomenon that, despite its public health relevance, remains poorly understood. Vibrio vulnificus, an emergent human pathogen, can cause a deadly septicaemia with over 50% mortality rate. To date, the ecological drivers that lead to the emergence of clinical strains and the unique genetic traits that allow these clones to colonize the human host remain mostly unknown. We recently surveyed a large estuary in eastern Florida, where outbreaks of the disease frequently occur, and found endemic populations of the bacterium. We established two sampling sites and observed strong correlations between location and pathogenic potential. One site is significantly enriched with strains that belong to one phylogenomic cluster (C1) in which the majority of clinical strains belong. Interestingly, strains isolated from this site exhibit phenotypic traits associated with clinical outcomes, whereas strains from the second site belong to a cluster that rarely causes disease in humans (C2). Analyses of C1 genomes indicate unique genetic markers in the form of clinical-associated alleles with a potential role in virulence. Finally, metagenomic and physicochemical analyses of the sampling sites indicate that this marked cluster distribution and genetic traits are strongly associated with distinct biotic and abiotic factors (e.g., salinity, nutrients, or biodiversity), revealing how ecosystems generate selective pressures that facilitate the emergence of specific strains with pathogenic potential in a population. This knowledge can be applied to assess the risk of pathogen emergence from environmental sources and integrated toward the development of novel strategies for the prevention of future outbreaks.
We explored the vast genetic diversity of environmental viruses by using a combination of cellular metagenome (as opposed to virome) sequencing using high-fidelity long-read sequences (in this case, PacBio CCS). This approach resulted in the recovery of a representative sample of the viral population, and it performed better (more phage contigs, larger average contig size) than Illumina sequencing applied to the same sample.
The increasing demand for products for human consumption is leading to the fast-growing expansion of numerous food sectors such as marine aquaculture (mariculture). However, excessive input of nutrients and pollutants modifies marine ecosystems. Here, we applied a metagenomic approach to investigate these perturbations in samples from marine farms of gilthead seabream cultures. Results revealed dysbiosis and functional imbalance within the net cage with a unique structure, with little interference with samples from the fish microbiota or those collected far away from the coast. Remarkably, below the cage the prokaryotic community was highly similar to the marine microbiome of photic offshore samples. We recovered 48 novel metagenome-assembled genomes. Metagenomic recruitment revealed a significant change in the microbial community which was dominated by several Proteobacteria orders (Sphingomonadales, Pseudomonadales, Caudobacterales and Rhizobiales). Genomic potential for bioremediation processes, including nitrate removal through aerobic denitrification, and degradation of aromatic compounds and other toxic products were enriched in these microbes. The detrimental side effects were the increased number of antimicrobial resistance genes and the presence of potentially emergent pathogens. Knowledge of this metabolic diversity and the microbes involved in ecological balance recovery can be used to reduce the environmental impact of these practices.
Pathogen emergence remains one of the most pressing public health concerns of our times. Here, using the agent of cholera, the Vibrio cholerae pandemic cholera group (PCG) as a model system, we investigate the evolutionary dynamics that lead to the emergence of human pathogens from environmental populations. Genomic comparison of over 1,100 V. cholerae genomes including novel isolates from this study, reveal a generalized cluster-driven phylogeny and evolution of the species. Emergence of PCG is largely based on modular acquisition of mobile genetic elements including small gene clusters and allelic variations, with distinct phylogenomic clusters acting as differential reservoirs of virulence factors. Surprisingly, PCG encodes few unique genes, which are mostly encoded within the super-integron, however, many genes within the group exhibit an aberrant degree of positive selection. Our analyses provide a compelling scenario for the emergence of pandemic clones in V. cholerae stablishing a blueprint for other bacterial pathogens.
Viruses of prokaryotes are extremely abundant and diverse. Culture-independent approaches have recently shed light on the biodiversity these biological entities. One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), based on the analysis of nearly 200,000 viral genomes. RaFAH outperformed other methods for virus-host prediction (F1-score = 0.97 at the level of phylum). RaFAH was applied to diverse datasets encompassing genomes of uncultured viruses derived from eight different biomes of medical, biotechnological, and environmental relevance, and was capable of accurately describing these viromes. This led to the discovery of 537 genomic sequences of archaeal viruses. These viruses represent previously unknown lineages and their genomes encode novel auxiliary metabolic genes, which shed light on how these viruses interfere with the host molecular machinery. RaFAH is available at https://sourceforge.net/projects/rafah/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.