Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structures at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.
Mariprofundus ferrooxydans PV-1 has provided the first genome of the recently discovered Zetaproteobacteria subdivision. Genome analysis reveals a complete TCA cycle, the ability to fix CO2, carbon-storage proteins and a sugar phosphotransferase system (PTS). The latter could facilitate the transport of carbohydrates across the cell membrane and possibly aid in stalk formation, a matrix composed of exopolymers and/or exopolysaccharides, which is used to store oxidized iron minerals outside the cell. Two-component signal transduction system genes, including histidine kinases, GGDEF domain genes, and response regulators containing CheY-like receivers, are abundant and widely distributed across the genome. Most of these are located in close proximity to genes required for cell division, phosphate uptake and transport, exopolymer and heavy metal secretion, flagellar biosynthesis and pilus assembly suggesting that these functions are highly regulated. Similar to many other motile, microaerophilic bacteria, genes encoding aerotaxis as well as antioxidant functionality (e.g., superoxide dismutases and peroxidases) are predicted to sense and respond to oxygen gradients, as would be required to maintain cellular redox balance in the specialized habitat where M. ferrooxydans resides. Comparative genomics with other Fe(II) oxidizing bacteria residing in freshwater and marine environments revealed similar content, synteny, and amino acid similarity of coding sequences potentially involved in Fe(II) oxidation, signal transduction and response regulation, oxygen sensation and detoxification, and heavy metal resistance. This study has provided novel insights into the molecular nature of Zetaproteobacteria.
Generating sequence data of a defined community composed of organisms with complete reference genomes is indispensable for the benchmarking of new genome sequence analysis methods, including assembly and binning tools. Moreover the validation of new sequencing library protocols and platforms to assess critical components such as sequencing errors and biases relies on such datasets. We here report the next generation metagenomic sequence data of a defined mock community (Mock Bacteria ARchaea Community; MBARC-26), composed of 23 bacterial and 3 archaeal strains with finished genomes. These strains span 10 phyla and 14 classes, a range of GC contents, genome sizes, repeat content and encompass a diverse abundance profile. Short read Illumina and long-read PacBio SMRT sequences of this mock community are described. These data represent a valuable resource for the scientific community, enabling extensive benchmarking and comparative evaluation of bioinformatics tools without the need to simulate data. As such, these data can aid in improving our current sequence data analysis toolkit and spur interest in the development of new tools.
The genus of Marinobacter is one of the most ubiquitous in the global oceans and assumed to significantly impact various biogeochemical cycles. The genome structure and content of Marinobacter aquaeolei VT8 was analyzed and compared with those from other organisms with diverse adaptive strategies. Here, we report the many "opportunitrophic" genetic characteristics and strategies that M. aquaeolei has adopted to promote survival under various environmental conditions. Genome analysis revealed its metabolic potential to utilize oxygen and nitrate as terminal electron acceptors, iron as an electron donor, and urea, phosphonate, and various hydrocarbons as alternative N, P, and C sources, respectively. Miscellaneous sensory and defense mechanisms, apparently acquired via horizontal gene transfer, are involved in the perception of environmental fluctuations and antibiotic, phage, toxin, and heavy metal resistance, enabling survival under adverse conditions, such as oil-polluted water. Multiple putative integrases, transposases, and plasmids appear to have introduced additional metabolic potential, such as phosphonate degradation. The genomic potential of M. aquaeolei and its similarity to other opportunitrophs are consistent with its cosmopolitan occurrence in diverse environments and highly variable lifestyles.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.