The gut virome is an incredibly complex part of the gut ecosystem. Gut viruses play a role in many disease states, but it is unknown to what extent the gut virome impacts everyday human health. New experimental and bioinformatic approaches are required to address this knowledge gap. Gut virome colonization begins at birth and is considered unique and stable in adulthood. The stable virome is highly specific to each individual and is modulated by varying factors such as age, diet, disease state, and use of antibiotics. The gut virome primarily comprises bacteriophages, predominantly order Crassvirales, also referred to as crAss-like phages, in industrialized populations and other Caudoviricetes (formerly Caudovirales). The stability of the virome’s regular constituents is disrupted by disease. Transferring the fecal microbiome, including its viruses, from a healthy individual can restore the functionality of the gut. It can alleviate symptoms of chronic illnesses such as colitis caused by Clostridiodes difficile. Investigation of the virome is a relatively novel field, with new genetic sequences being published at an increasing rate. A large percentage of unknown sequences, termed ‘viral dark matter’, is one of the significant challenges facing virologists and bioinformaticians. To address this challenge, strategies include mining publicly available viral datasets, untargeted metagenomic approaches, and utilizing cutting-edge bioinformatic tools to quantify and classify viral species. Here, we review the literature surrounding the gut virome, its establishment, its impact on human health, the methods used to investigate it, and the viral dark matter veiling our understanding of the gut virome.
Bacteroidota are the most common bacteria in the human gut and are responsible for degrading complex polysaccharides that would otherwise remain undigested. The abundance of Bacteroides in the gut is shaped by phages such as crAssphages that infect and kill them. While close to 600 genomes have been identified computationally, only four have been successfully cultured. Here, we identify and characterize three novel crAssphage species isolated from wastewater and infecting the bacterial hostBacteroides cellulosilyticusWH2. We named the novel species,Kehishuvirus winsdale(Bc01),Kolpuevirus frurule(Bc03), andRudgehvirus redwords(Bc11) which span two different families and three genera. These phages may not have co-evolved with their respective bacterial hosts. The phages had a conserved gene arrangement with known crAssphages, but gene similarity within phages belonging to the same taxa was highly variable. Across the three species, only two structural genes encoding a hypothetical protein and a tail spike protein were similar. Evolutionary analysis revealed the tail spike protein is undergoing purifying selection and was predicted to bind to a TonB-dependent transporter on the host cell surface, suggesting a role for host specificity. This study expands the known crAssphage isolates and reveals insights into the crAssphage infection mechanism. The availability of pure cultures of multiple crAssphage infecting the same host provides an opportunity to perform controlled experiments on one of the most dominant members of the human enteric virome.
Background Due to the ever-expanding gap between the number of proteins being discovered and their functional characterization, protein function inference remains a fundamental challenge in computational biology. Currently, known protein annotations are organized in human-curated ontologies, however, all possible protein functions may not be organized accurately. Meanwhile, recent advancements in natural language processing and machine learning have developed models which embed amino acid sequences as vectors in n-dimensional space. So far, these embeddings have primarily been used to classify protein sequences using manually constructed protein classification schemes. Results In this work, we describe the use of amino acid sequence embeddings as a systematic framework for studying protein ontologies. Using a sequence embedding, we show that the bacterial carbohydrate metabolism class within the SEED annotation system contains 48 clusters of embedded sequences despite this class containing 29 functional labels. Furthermore, by embedding Bacillus amino acid sequences with unknown functions, we show that these unknown sequences form clusters that are likely to have similar biological roles. Conclusions This study demonstrates that amino acid sequence embeddings may be a powerful tool for developing more robust ontologies for annotating protein sequence data. In addition, embeddings may be beneficial for clustering protein sequences with unknown functions and selecting optimal candidate proteins to characterize experimentally.
Microbial communities found within the human gut have a strong influence on human health. Intestinal bacteria and viruses influence gastrointestinal diseases such as inflammatory bowel disease. Viruses infecting bacteria, known as bacteriophages, play a key role in modulating bacterial communities within the human gut. However, the identification and characterisation of novel bacteriophages remain a challenge. Available tools use similarities between sequences, nucleotide composition, and the presence of viral genes/proteins. Most available tools consider individual contigs to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of viral genomes can occur, leading to the need for new approaches in viral identification. We introduce Phables, a new computational method to resolve bacteriophage genomes from fragmented viral metagenomic assemblies. Phables identifies bacteriophage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that over 80% of the bacteriophage genomes resolved by Phables have high quality and are longer than the individual contigs identified by existing viral identification tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.