Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin.
High-throughput chromosome conformation capture (Hi-C) has recently been applied to natural microbial communities and revealed great potential to study multiple genomes simultaneously. Several extraneous factors may influence chromosomal contacts rendering the normalization of Hi-C contact maps essential for downstream analyses. However, the current paucity of metagenomic Hi-C normalization methods and the ignorance for spurious inter-species contacts weaken the interpretability of the data. Here, we report on two types of biases in metagenomic Hi-C experiments: explicit biases and implicit biases, and introduce HiCzin, a parametric model to correct both types of biases and remove spurious inter-species contacts. We demonstrate that the normalized metagenomic Hi-C contact maps by HiCzin result in lower biases, higher capability to detect spurious contacts, and better performance in metagenomic contig clustering. The HiCzin software and Supplementary Material are available at https://github.com/dyxstat/HiCzin.
The introduction of high-throughput chromosome conformation capture (Hi-C) into metagenomics enables reconstructing high-quality metagenome-assembled genomes (MAGs) from microbial communities. Despite recent advances in recovering eukaryotic, bacterial, and archaeal genomes using Hi-C contact maps, few of Hi-C-based methods are designed to retrieve viral genomes. Here we introduce ViralCC, a publicly available tool to recover complete viral genomes and detect virus-host pairs using Hi-C data. Compared to other Hi-C-based methods, ViralCC leverages the virus-host proximity structure as a complementary information source for the Hi-C interactions. Using mock and real metagenomic Hi-C datasets from several different microbial ecosystems, including the human gut, cow fecal, and wastewater, we demonstrate that ViralCC outperforms existing Hi-C-based binning methods as well as state-of-the-art tools specifically dedicated to metagenomic viral binning. ViralCC can also reveal the taxonomic structure of viruses and virus-host pairs in microbial communities. When applied to a real wastewater metagenomic Hi-C dataset, ViralCC constructs a phage-host network, which is further validated using CRISPR spacer analyses. ViralCC is an open-source pipeline available at https://github.com/dyxstat/ViralCC.
Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Conventional shotgun-based binning approaches may encounter barriers when multiple samples are scarce. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. Several Hi-C-based binning pipelines have been put forward and yielded state-of-the-art results using a single sample. We conclude that normalization and clustering are two vital steps in the Hi-C-based binning analyses, and develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden community detection algorithm based on the Potts spin-glass model, and includes the spurious contact detection into binning pipelines for the first time. Using the metagenomic yeast sample with perfect ground truth of contigs' species identity, we comprehensively evaluate the impacts on binning performance of different normalization methods and clustering algorithms from the HiCBin and other available metagenomic Hi-C analysis pipelines, demonstrate that the HiCzin and the Leiden algorithm achieve the best binning accuracy, and show that the spurious contact detection can improve the retrieval performance. We also validate our method and compare the capability to recover high-quality MAGs of HiCBin against other state-of-the-art Hi-C-based binning tools including ProxiMeta, bin3C, and MetaTOR, and one popular shotgun-based binning software MetaBAT2 on a human gut sample and a wastewater sample. HiCBin provides the best performance and applicability in resolving MAGs, and is available at https://github.com/dyxstat/HiCBin.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.