Dimethylsulfoniopropionate (DMSP) is produced mainly by phytoplankton and bacteria. It is relatively abundant and ubiquitous in the marine environment, where bacterioplankton make use of it readily as both carbon and sulfur sources. In one transformation pathway, part of the molecule becomes dimethylsulfide (DMS), which escapes into the atmosphere and plays an important role in the sulfur exchange between oceans and atmosphere. Through its other dominant catabolic pathway, bacteria are able to use it as sulfur source. During the past few years, a number of genes involved in its transformation have been characterized. Identifying genes in taxonomic groups not amenable to conventional methods of cultivation is challenging. Indeed, functional annotation of genes in environmental studies is not straightforward, considering that particular taxa are not well represented in the available sequence databases. Furthermore, many genes belong to families of paralogs with similar sequences but perhaps different functions. In this study, we develop in silico approaches to infer protein function of an environmentally important gene (dmdA) that carries out the first step in the sulfur assimilation from DMSP. The method combines a set of tools to annotate a targeted gene in genome databases and metagenome assemblies. The method will be useful to identify genes that carry out key biochemical processes in the environment.
Background
Genome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs.
Results
Reference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes.
Conclusions
Not all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.