Black soldier flies (BSFs, Hermetia illucens) are becoming a prominent research model encouraged by the insect as food and feed and waste bioconversion industries. Insect mass-rearing facilities are at risk from the spread of viruses, but so far, none have been described in BSFs. To fill this knowledge gap, a bioinformatic approach was undertaken to discover viruses specifically associated with BSFs. First, BSF genomes were screened for the presence of endogenous viral elements (EVEs). This led to the discovery and mapping of seven orthologous EVEs integrated into three BSF genomes originating from five viral families. Secondly, a virus discovery pipeline was used to screen BSF transcriptomes. This led to detecting a new exogenous totivirus that we named hermetia illucens totivirus 1 (HiTV1). Phylogenetic analyses showed this virus belongs to a clade of insect-specific totiviruses and is closely related to the largest EVE located on chromosome 1 of the BSF genome. Lastly, this EVE was found to express a small transcript in some BSFs infected by HiTV1. Altogether, this data mining study showed that far from being unscathed from viruses, BSFs bear traces of past interactions with several viral families and of present interactions with the exogenous HiTV1.
During the last decades, metagenomics has highlighted the diversity of microorganisms from environmental or host-associated samples. Most metagenomics public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs. Consequently, eukaryotic contigs with intrinsically different gene features, are not optimally annotated. Using a bioinformatics pipeline, we have filtered 7.9 billion contigs from 6,872 soil metagenomes in the JGI’s IMG/M database to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins and over 300,000 orphan proteins lacking homology in public databases. Comparing the gene predictions we made with initial JGI ones on the same contigs, we confirmed our pipeline improves eukaryotic proteins completeness and contiguity in soil metagenomes. The improved quality of eukaryotic proteins combined with a more comprehensive assignment method yielded more reliable taxonomic annotation. This dataset of eukaryotic soil proteins with improved completeness, quality and taxonomic annotation reliability is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes.
Background: During the last decades, shotgun metagenomics and metabarcoding have highlighted the diversity of microorganisms from environmental or host-associated samples. Most assembled metagenome public repositories use annotation pipelines tailored for prokaryotes regardless of the taxonomic origin of contigs and metagenome-assembled genomes (MAGs). Consequently, eukaryotic contigs and MAGs, with intrinsically different gene features, are not optimally annotated, resulting in an incorrect representation of the eukaryotic component of biodiversity, despite their biological relevance. Results: Using an automated analysis pipeline, we have filtered 7.9 billion of contigs from 6,873 soil metagenomes in the IMG/M database of the Joint Genome Institute to identify eukaryotic contigs. We have re-annotated genes using eukaryote-tailored methods, yielding 8 million eukaryotic proteins. Of these, 5.6 million could be traced back to non-chimeric higher confidence eukaryotic contigs. Our pipeline improves eukaryotic proteins completeness, contiguity and quality. Moreover, the better quality of eukaryotic proteins combined with a more comprehensive assignment method improves the taxonomic annotation as well. Conclusions | Using public soil metagenomic data, we provide a dataset of eukaryotic soil proteins with improved completeness and quality as well as a more reliable taxonomic annotation. This unique resource is of interest for any scientist aiming at studying the composition, biological functions and gene flux in soil communities involving eukaryotes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.