Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k -mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 -8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 -1.8 million extra genes and reconstruct 117 -246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb
Summary
Lesions on DNA uncouple DNA synthesis from the replisome, generating stretches of unreplicated single-stranded DNA (ssDNA) behind the replication fork. These ssDNA gaps need to be filled in to complete DNA duplication. Gap-filling synthesis involves either translesion DNA synthesis (TLS) or template switching (TS). Controlling these processes, ubiquitylated PCNA recruits many proteins that dictate pathway choice, but the enzymes regulating PCNA ubiquitylation in vertebrates remain poorly defined. Here we report that the E3 ubiquitin ligase RFWD3 promotes ubiquitylation of proteins on ssDNA. The absence of RFWD3 leads to a profound defect in recruitment of key repair and signaling factors to damaged chromatin. As a result, PCNA ubiquitylation is inhibited without RFWD3, and TLS across different DNA lesions is drastically impaired. We propose that RFWD3 is an essential coordinator of the response to ssDNA gaps, where it promotes ubiquitylation to drive recruitment of effectors of PCNA ubiquitylation and DNA damage bypass.
Despite the accelerating number of uncultivated virus sequences discovered in metagenomics and their apparent importance for health and disease, the human gut virome and its interactions with bacteria in the gastrointestinal tract are not well understood. This is partly due to a paucity of whole-virome datasets and limitations in current approaches for identifying viral sequences in metagenomics data. Here, combining a deep-learning based metagenomics binning algorithm with paired metagenome and metavirome datasets, we develop Phages from Metagenomics Binning (PHAMB), an approach that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations. When applied on the Human Microbiome Project 2 (HMP2) dataset, PHAMB recovered 6,077 high-quality genomes from 1,024 viral populations, and identified viral-microbial host interactions. PHAMB can be advantageously applied to existing and future metagenomes to illuminate viral ecological dynamics with other microbiome constituents.
The many microbial communities around us form interactive and dynamic ecosystems called microbiomes. Though concealed from the naked eye, microbiomes govern and influence macroscopic systems including human health, plant resilience, and biogeochemical cycling. Such feats have attracted interest from the scientific community, which has recently turned to machine learning and deep learning methods to interrogate the microbiome and elucidate the relationships between its composition and function. Here, we provide an overview of how the latest microbiome studies harness the inductive prowess of artificial intelligence methods. We start by highlighting that microbiome data – being compositional, sparse, and high-dimensional – necessitates special treatment. We then introduce traditional and novel methods and discuss their strengths and applications. Finally, we discuss the outlook of machine and deep learning pipelines, focusing on bottlenecks and considerations to address them.
Genome-Wide Association Study (GWAS) Higher Blood pressure Arthritides Neuropsychiatric conditions Malignancies Lower Anaemias Lipidaemias Ischaemic heart disease Genetically higher central obesity Highlights Variants in HFE and TMPRSS6 are associated with higher liver iron. There is genetic evidence that higher central obesity causes higher liver iron. Liver iron variants are not organ specific and associate with multiple diseases.
DNA interstrand crosslinks (ICLs) are cytotoxic lesions that threaten genome integrity. The Fanconi anemia (FA) pathway orchestrates ICL repair during DNA replication, with ubiquitylated FANCI-FANCD2 (ID2) marking the activation step that triggers incisions on DNA to unhook the ICL. Restoration of intact DNA requires the coordinated actions of polymerase f (Polf)-mediated translesion synthesis (TLS) and homologous recombination (HR). While the proteins mediating FA pathway activation have been well characterized, the effectors regulating repair pathway choice to promote error-free ICL resolution remain poorly defined. Here, we uncover an indispensable role of SCAI in ensuring error-free ICL repair upon activation of the FA pathway. We show that SCAI forms a complex with Polf and localizes to ICLs during DNA replication. SCAI-deficient cells are exquisitely sensitive to ICL-inducing drugs and display major hallmarks of FA gene inactivation. In the absence of SCAI, HR-mediated ICL repair is defective, and breaks are instead re-ligated by polymerase h-dependent microhomologymediated end-joining, generating deletions spanning the ICL site and radial chromosomes. Our work establishes SCAI as an integral FA pathway component, acting at the interface between TLS and HR to promote error-free ICL repair.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.