The interpretation of genomic, transcriptomic and other microbial ‘omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/
The bacterial SAR324 cluster is ubiquitous and abundant in the ocean, especially around hydrothermal vents and in the deep sea, where it can account for up to 30% of the whole bacterial community. According to a new taxonomy generated using multiple universal protein-coding genes (instead of the previously used 16S rRNA single gene marker), the former Deltaproteobacteria cluster SAR324 has been classified since 2018 as its own phylum. Yet, very little is known about its phylogeny and metabolic potential. We downloaded all publicly available SAR324 genomes (65) from all natural environments and reconstructed 18 new genomes using publicly available oceanic metagenomic data and unpublished data from the waters underneath the Ross Ice Shelf. We calculated a global SAR324 phylogenetic tree and identified six clusters (namely 1A, 1B, 2A, 2B, 2C and 2D) within this clade. Genome annotation and metatranscriptome read mapping showed that SAR324 clades possess a flexible array of genes suited for survival in various environments. Clades 2A and 2C are mostly present in the surface mesopelagic layers of global oceans, while clade 2D dominates in deeper regions. Our results show that SAR324 has a very versatile and broad metabolic potential, including many heterotrophic, but also autotrophic pathways. While one surface water associated clade (2A) seems to use proteorhodopsin to gain energy from solar radiation, some deep-sea genomes from clade 2D contain the complete Calvin–Benson–Bassham cycle gene repertoire to fix carbon. This, in addition to a variety of other genes and pathways for both oxic (e.g., dimethylsulfoniopropionate degradation) and anoxic (e.g., dissimilatory sulfate reduction, anaerobic benzoate degradation) conditions, can help explain the ubiquitous presence of SAR324 in aquatic habitats.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.