Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k -mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 -8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 -1.8 million extra genes and reconstruct 117 -246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
While the field of computational protein design has witnessed amazing progression in recent years, folding properties still constitute a significant barrier towards designing new and larger proteins. In order to assess and improve folding properties of designed proteins, we have developed a genetics-based folding assay and selection system based on the essential enzyme, orotate phosphoribosyl transferase from Escherichia coli. This system allows for both screening of candidate designs with good folding properties and genetic selection of improved designs. Thus, we identified single amino acid substitutions in two failed designs that rescued poorly folding and unstable proteins. Furthermore, when these substitutions were transferred into a well-structured design featuring a complex folding profile, the resulting protein exhibited native-like cooperative folding with significantly improved stability. In protein design, a single amino acid can make the difference between folding and misfolding, and this approach provides a useful new platform to identify and improve candidate designs.
Despite the accelerating number of uncultivated virus sequences discovered in metagenomics and their apparent importance for health and disease, the human gut virome and its interactions with bacteria in the gastrointestinal tract are not well understood. This is partly due to a paucity of whole-virome datasets and limitations in current approaches for identifying viral sequences in metagenomics data. Here, combining a deep-learning based metagenomics binning algorithm with paired metagenome and metavirome datasets, we develop Phages from Metagenomics Binning (PHAMB), an approach that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations. When applied on the Human Microbiome Project 2 (HMP2) dataset, PHAMB recovered 6,077 high-quality genomes from 1,024 viral populations, and identified viral-microbial host interactions. PHAMB can be advantageously applied to existing and future metagenomes to illuminate viral ecological dynamics with other microbiome constituents.
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the community-driven initiative for the Critical Assessment of Metagenome Interpretation (CAMI). In its second challenge, CAMI engaged the community to assess their methods on realistic and complex metagenomic datasets with long and short reads, created from ∼1,700 novel and known microbial genomes, as well as ∼600 novel plasmids and viruses. Altogether 5,002 results by 76 program versions were analyzed, representing a 22x increase in results.Substantial improvements were seen in metagenome assembly, some due to using long-read data. The presence of related strains still was challenging for assembly and genome binning, as was assembly quality for the latter. Taxon profilers demonstrated a marked maturation, with taxon profilers and binners excelling at higher bacterial taxonomic ranks, but underperforming for viruses and archaea. Assessment of clinical pathogen detection techniques revealed a need to improve reproducibility. Analysis of program runtimes and memory usage identified highly efficient programs, including some top performers with other metrics. The CAMI II results identify current challenges, but also guide researchers in selecting methods for specific analyses.
The medicinal plant Tripterygium wilfordii (Celastraceae) contains a pair of class II diterpene synthases (diTPS) of specialized labdane-type metabolism that, despite remarkably close homology, form strikingly different products. TwTPS21 catalyzes bicyclization of the linear C20 precursor geranylgeranyl diphosphate to ent-copal-8-ol diphosphate, while TwTPS14 forms kolavenyl diphosphate. To determine the amino acid signature controlling the functional divergence of the homologues, we modeled their structures based on an existing crystal structure of the Arabidopsis ent-copalyl diphosphate synthase, archetypal of diTPSs in general metabolism of gibberellin phytohormones. Of the residues differing between TwTPS21 and TwTPS14 two located to the predicted active site, and we hypothesized that these are responsible for the functional differentiation of the enzymes. Using site-directed mutagenesis, we generated a panel of six variants, where one, or both positions were exchanged between the enzymes. In coupled heterologous assays with a corresponding class I diTPS, TwTPS2, complete product interchange was observed in variants with both reciprocal mutations, while substitutions of either residue gave mixed product profiles. Two mutants, TwTPS14:Y265H and TwTPS21:A325V, also produced ent-copalyl diphosphate, highlighting the evolutionary potential of enzymes of this family to drive rapid diversification of plant diterpene biosynthesis through neo-functionalization. Our study contributes to the understanding of structure-function relation in plant class II diTPSs and complements previous mutational studies of Arabidopsis ent-copalyl diphosphate synthase with additional examples from the specialized metabolism of T. wilfordii.
Since late 2020, outbreaks of H5 highly pathogenic avian influenza (HPAI) viruses belonging to clade 2.3.4.4b have emerged in Europe. To investigate the evolutionary history of these viruses, we performed genetic characterization on the first HPAI viruses found in Denmark during the autumn of 2020. H5N8 viruses from 14 wild birds and poultry, as well as one H5N5 virus from a wild bird, were characterized by whole genome sequencing and phylogenetic analysis. The Danish H5N8 viruses were found to be genetically similar to each other and to contemporary European clade 2.3.4.4b H5N8 viruses, while the Danish H5N5 virus was shown to be a unique genotype from the H5N5 viruses that circulated at the same time in Russia, Germany, and Belgium. Genetic analyses of one of the H5N8 viruses revealed the presence of a substitution (PB2-M64T) that is highly conserved in human seasonal influenza A viruses. Our analyses showed that the late 2020 clade 2.3.4.4b HPAI H5N8 viruses were most likely new incursions introduced by migrating birds to overwintering sites in Europe, rather than the result of continued circulation of H5N8 viruses from previous introductions to Europe in 2016/2017 and early 2020.
One Health surveillance of antimicrobial resistance (AMR) depends on a harmonized method for detection of AMR. Metagenomics-based surveillance offers the possibility to compare resistomes within and between different target populations. Its potential to be embedded into policy in the future calls for a timely and integrated knowledge dissemination strategy. We developed a blended training (e-learning and a workshop) on the use of metagenomics in surveillance of pathogens and AMR. The objectives were to highlight the potential of metagenomics in the context of integrated surveillance, to demonstrate its applicability through hands-on training and to raise awareness to bias factors 1 . The target participants included staff of competent authorities responsible for AMR monitoring and academic staff. The training was organized in modules covering the workflow, requirements, benefits and challenges of surveillance by metagenomics. The training had 41 participants. The face-to-face workshop was essential to understand the expectations of the participants about the transition to metagenomics-based surveillance. After revision of the e-learning, we released it as a Massive Open Online Course (MOOC), now available at https://www.coursera.org/learn/metagenomics. This course has run in more than 20 sessions, with more than 3,000 learners enrolled, from more than 120 countries. Blended learning and MOOCs are useful tools to deliver knowledge globally and across disciplines. The released MOOC can be a reference knowledge source for international players in the application of metagenomics in surveillance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.