2017
DOI: 10.1101/099127
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

Abstract: In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were gene… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

6
56
0
2

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 46 publications
(64 citation statements)
references
References 53 publications
6
56
0
2
Order By: Relevance
“…MAGs are obtained by grouping or 'binning' together assembled contigs with similar sequence composition, depth of coverage across one or more related samples and taxonomic affiliations 16,17 . Several tools have been developed that exploit these sources of information to produce genomes from metagenomic data [18][19][20][21] and there are ongoing efforts to evaluate the effectiveness of different approaches 22 . Although closed genomes have been obtained using metagenomic binning methods 10,23 , MAGs are typically incomplete and may contain contigs from multiple strains or species due to challenges in distinguishing between related community members both in the assembly and binning processes 19,24 .…”
mentioning
confidence: 99%
“…MAGs are obtained by grouping or 'binning' together assembled contigs with similar sequence composition, depth of coverage across one or more related samples and taxonomic affiliations 16,17 . Several tools have been developed that exploit these sources of information to produce genomes from metagenomic data [18][19][20][21] and there are ongoing efforts to evaluate the effectiveness of different approaches 22 . Although closed genomes have been obtained using metagenomic binning methods 10,23 , MAGs are typically incomplete and may contain contigs from multiple strains or species due to challenges in distinguishing between related community members both in the assembly and binning processes 19,24 .…”
mentioning
confidence: 99%
“…This is the strategy we used for designing the simulated dataset. We provided a set of comprehensive simulated metagenomic Very recently, a group of method developers published a consortium work on the critical assessment of metagenome interpretation (CAMI) [16]. It was the summary of a challenge for benchmarking many programs for metagenomic data using a data set generated from about 700 microorganisms and 600 viruses and plasmids.…”
Section: Conlusion and Discussionmentioning
confidence: 99%
“…Simulated metagenomic data have already been used in many software benchmarking studies [7,10,[12][13][14][15][16]. Several simulators have been developed to generate simulated metagenomic data [17], such as MetaSim [18], NeSSM [19], BEAR [20], and FASTQSim [21].…”
Section: Introductionmentioning
confidence: 99%
“…This dataset is quite complex (232 genomes, two sample replicates). We also retrieved the results of two highest-performing automatic binning programs, MaxBin and Metawatt, in the CAMI challenge evaluation (Sczyrba et al, 2017). We took the simplest possible approach: we trained MLGEX on the genome bins derived by these methods and classified the contigs to the bins with the highest likelihood, thus ignoring all details of contig splitting, b or p-value calculation and the possibility of changing the number of genome bins.…”
Section: Genome Bin Refinementmentioning
confidence: 99%
“…In our evaluation, we only used information provided to the contestants by the time of the challenge. We report the results for two settings for each method using the recall, the fraction of overall assigned contigs (bp), and the adjusted rand index (ARI) as defined in Sczyrba et al (2017). Both measures are dependent so that usually a tradeoff between them is chosen.…”
Section: Genome Bin Refinementmentioning
confidence: 99%