Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the virus relies on the host for survival. Currently, the method for determining the viral host is either to culture the virus, which is low-throughput, time-consuming, and expensive, or to computationally predict the viral hosts, which needs improvements at both accuracy and usability. Here we develop a Gaussian model to predict hosts for prokaryotic viruses with better performances than previous computational methods. Results We present here Prokaryotic virus Host Predictor (PHP), a software tool using a Gaussian model, to predict hosts for prokaryotic viruses using the differences of k-mer frequencies between viral and host genomic sequences as features. PHP gave a host prediction accuracy of 34% (genus level) on the VirHostMatcher benchmark dataset and a host prediction accuracy of 35% (genus level) on a new dataset containing 671 viruses and 60,105 prokaryotic genomes. The prediction accuracy exceeded that of two alignment-free methods (VirHostMatcher and WIsH, 28–34%, genus level). PHP also outperformed these two alignment-free methods much (24–38% vs 18–20%, genus level) when predicting hosts for prokaryotic viruses which cannot be predicted by the BLAST-based or the CRISPR-spacer-based methods alone. Requiring a minimal score for making predictions (thresholding) and taking the consensus of the top 30 predictions further improved the host prediction accuracy of PHP. Conclusions The Prokaryotic virus Host Predictor software tool provides an intuitive and user-friendly API for the Gaussian model described herein. This work will facilitate the rapid identification of hosts for newly identified prokaryotic viruses in metagenomic studies.
One major threat to global food security that requires immediate attention, is the increasing incidence of host shift and host expansion in growing number of pathogenic fungi and emergence of new pathogens. The threat is more alarming because, yield quality and quantity improvement efforts are encouraging the cultivation of uniform plants with low genetic diversity that are increasingly susceptible to emerging pathogens. However, the influence of host genome differentiation on pathogen genome differentiation and its contribution to emergence and adaptability is still obscure. Here, we compared genome sequence of 6 isolates of Magnaporthe species obtained from three different host plants. We demonstrated the evolutionary relationship between Magnaporthe species and the influence of host differentiation on pathogens. Phylogenetic analysis showed that evolution of pathogen directly corresponds with host divergence, suggesting that host-pathogen interaction has led to co-evolution. Furthermore, we identified an asymmetric selection pressure on Magnaporthe species. Oryza sativa-infecting isolates showed higher directional selection from host and subsequently tends to lower the genetic diversity in its genome. We concluded that, frequent gene loss or gain, new transposon acquisition and sequence divergence are host adaptability mechanisms for Magnaporthe species, and this coevolution processes is greatly driven by directional selection from host plants.
Viruses have caused much mortality and morbidity to humans and pose a serious threat to global public health. The virome with the potential of human infection is still far from complete. Novel viruses have been discovered at an unprecedented pace as the rapid development of viral metagenomics. However, there is still a lack of methodology for rapidly identifying novel viruses with the potential of human infection. This study built several machine learning models to discriminate human‐infecting viruses from other viruses based on the frequency of k‐mers in the viral genomic sequences. The k‐nearest neighbor (KNN) model can predict the human‐infecting viruses with an accuracy of over 90%. The performance of this KNN model built on the short contigs (≥1 kb) is comparable to those built on the viral genomes. We used a reported human blood virome to further validate this KNN model with an accuracy of over 80% based on very short raw reads (150 bp). Our work demonstrates a conceptual and generic protocol for the discovery of novel human‐infecting viruses in viral metagenomics studies.
The life-threatening coronaviruses MERS-CoV, SARS-CoV-1 and SARS-CoV-2 (SARS-CoV-1/2) have caused and will continue to cause enormous morbidity and mortality to humans. Virus-encoded noncoding RNAs are poorly understood in coronaviruses. Data mining of viral-infection-related RNA-sequencing data has resulted in the identification of 28 754, 720 and 3437 circRNAs encoded by MERS-CoV, SARS-CoV-1 and SARS-CoV-2, respectively. MERS-CoV exhibits much more prominent ability to encode circRNAs in all genomic regions than those of SARS-CoV-1/2. Viral circRNAs typically exhibit low expression levels. Moreover, majority of the viral circRNAs exhibit expressions only in the late stage of viral infection. Analysis of the competitive interactions of viral circRNAs, human miRNAs and mRNAs in MERS-CoV infections reveals that viral circRNAs up-regulated genes related to mRNA splicing and processing in the early stage of viral infection, and regulated genes involved in diverse functions including cancer, metabolism, autophagy, viral infection in the late stage of viral infection. Similar analysis in SARS-CoV-2 infections reveals that its viral circRNAs down-regulated genes associated with metabolic processes of cholesterol, alcohol, fatty acid and up-regulated genes associated with cellular responses to oxidative stress in the late stage of viral infection. A few genes regulated by viral circRNAs from both MERS-CoV and SARS-CoV-2 were enriched in several biological processes such as response to reactive oxygen and centrosome localization. This study provides the first glimpse into viral circRNAs in three deadly coronaviruses and would serve as a valuable resource for further studies of circRNAs in coronaviruses.
Circular RNAs (circRNAs) are covalently closed long noncoding RNAs critical in diverse cellular activities and multiple human diseases. Several cancer-related viral circRNAs have been identified in double-stranded DNA viruses (dsDNA), yet no systematic study about the viral circRNAs has been reported. Herein, we have performed a systematic survey of 11 924 circRNAs from 23 viral species by computational prediction of viral circRNAs from viral-infection-related RNA sequencing data. Besides the dsDNA viruses, our study has also revealed lots of circRNAs in single-stranded RNA viruses and retro-transcribing viruses, such as the Zika virus, the Influenza A virus, the Zaire ebolavirus, and the Human immunodeficiency virus 1. Most viral circRNAs had reverse complementary sequences or repeated sequences at the flanking sequences of the back-splice sites. Most viral circRNAs only expressed in a specific cell line or tissue in a specific species. Functional enrichment analysis indicated that the viral circRNAs from dsDNA viruses were involved in KEGG pathways associated with cancer. All viral circRNAs presented in the current study were stored and organized in VirusCircBase, which is freely available at http://www.computationalbiology.cn/ViruscircBase/home.html and is the first virus circRNA database. VirusCircBase forms the fundamental atlas for the further exploration and investigation of viral circRNAs in the context of public health.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.