2014
DOI: 10.1038/nmeth.3103
|View full text |Cite
|
Sign up to set email alerts
|

Binning metagenomic contigs by coverage and composition

Abstract: Shotgun sequencing enables the reconstruction of genomes from complex microbial communities, but because assembly does not reconstruct entire genomes, it is necessary to bin genome fragments. Here we present CONCOCT, a new algorithm that combines sequence composition and coverage across multiple samples, to automatically cluster contigs into genomes. We demonstrate high recall and precision on artificial as well as real human gut metagenome data sets.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

6
1,617
0
5

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 1,916 publications
(1,737 citation statements)
references
References 18 publications
6
1,617
0
5
Order By: Relevance
“…The need for de novo reconstruction of microbial genomes from environmental samples through shotgun metagenomics data has given raise to advanced techniques and software platforms that can make sense of complex assemblies [52,53,14,15,16]. Our study demonstrates that these approaches can be effectively used in eukaryotic assembly projects for curation purposes.…”
Section: Resultsmentioning
confidence: 85%
See 1 more Smart Citation
“…The need for de novo reconstruction of microbial genomes from environmental samples through shotgun metagenomics data has given raise to advanced techniques and software platforms that can make sense of complex assemblies [52,53,14,15,16]. Our study demonstrates that these approaches can be effectively used in eukaryotic assembly projects for curation purposes.…”
Section: Resultsmentioning
confidence: 85%
“…Today, microbiologists often exploit two essential properties of bacterial and archaeal genomes to improve the "binning" step: (1) k-mer frequencies that are somewhat preserved throughout a single microbial genome [8], to identify contigs that likely originate from the same genome [9], and (2) a set of genes that occur in the vast majority of bacterial genomes as a single copy, to estimate the level of completion and contamination of genome bins [10,11,12]. These properties, along with differential coverage of contigs across multiple samples when such data exist, are routinely used to identify coherent microbial draft genomes in metagenomic assemblies [13,14,15,16].…”
Section: Introductionmentioning
confidence: 99%
“…Two HMP real datasets, one human gut dataset, and HiSeq/MiSeq datasets were used because such datasets were widely used in assessing other metagenomic tools (Alneberg et al, 2014;Kultima et al, 2012;Namiki et al, 2012;Wang et al, 2012;Wood and Salzberg, 2014). The HMP mock dataset was used because it was different from other three datasets and contained only single-end read.…”
Section: Mbmc Was Superior To Other Methods On Experimental Datasetsmentioning
confidence: 99%
“…Recent techniques which repeatedly sample an environment, extracting a signal based on correlated changes in abundance to identify genomic content that is likely to belong to individual strains or populations of cells, have confidently obtained species resolution (Alneberg et al, 2013;Imelfort et al, 2014) and begun to work toward strain (genotype) resolution (Cleary et al, 2015). Inferring abundance per-sample from contig coverage (Alneberg et al, 2013;Imelfort et al, 2014) or k-mer frequencies (Cleary et al, 2015) respectively, the strength of this discriminating signal is a function of community diversity, environmental variation and sampling depth; and represents a significant computational task.…”
Section: Introductionmentioning
confidence: 99%