2017
DOI: 10.1186/s13059-017-1299-7
|View full text |Cite|
|
Sign up to set email alerts
|

Comprehensive benchmarking and ensemble approaches for metagenomic classifiers

Abstract: BackgroundOne of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited.ResultsIn this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. To… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

9
249
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 283 publications
(265 citation statements)
references
References 73 publications
9
249
0
Order By: Relevance
“…Well characterized reference standards and controls are needed to ensure mNGS assay quality and stability over time. Most available metagenomic reference materials are highly customized to specific applications (for exam ple, ZymoBIOMICS Microbial Community Standard for microbiome analyses and bacterial and fungal meta genomics 105 ) and/or focused on a more limited spec trum of organisms (for example, the National Institute of Standards and Technology (NIST) reference materials for mixed microbial DNA detection, which contain only bacteria 106 ). Thus, these materials may not be applicable to untargeted mNGS analyses.…”
Section: Reference Standardsmentioning
confidence: 99%
See 1 more Smart Citation
“…Well characterized reference standards and controls are needed to ensure mNGS assay quality and stability over time. Most available metagenomic reference materials are highly customized to specific applications (for exam ple, ZymoBIOMICS Microbial Community Standard for microbiome analyses and bacterial and fungal meta genomics 105 ) and/or focused on a more limited spec trum of organisms (for example, the National Institute of Standards and Technology (NIST) reference materials for mixed microbial DNA detection, which contain only bacteria 106 ). Thus, these materials may not be applicable to untargeted mNGS analyses.…”
Section: Reference Standardsmentioning
confidence: 99%
“…Customized data sets can be prepared to mimic input sequence data and expand the range of microorganisms detected through in silico analysis 37 . The use of standardized reference mate rials and NGS data sets is also helpful in comparative evaluation of different bioinformatics pipelines 105 .…”
Section: Bioinformatics Challengesmentioning
confidence: 99%
“…However, while many methods have been proposed for taxonomic classification [12,13], the accuracy of these methods using different training databases has not been fully tested. This is an important issue, because as new genome data are generated, training data sets, such as the commonly used the NCBI Reference Sequence Database (RefSeq) will change over time.…”
Section: /19mentioning
confidence: 99%
“…Taxonomic classification is usually one of the first steps in a metagenomic pipeline [11]. Once these organisms are identified, they are then used in downstream analyses, such as alpha/beta diversity measures, ordination, feature selection, phenotype classification, etc.However, while many methods have been proposed for taxonomic classification [12,13], the accuracy of these methods using different training databases has not been fully tested. This is an important issue, because as new genome data are generated, training data sets, such as the commonly used the NCBI Reference Sequence Database (RefSeq) will change over time.…”
mentioning
confidence: 99%
“…Methods in Ecology and Evoluঞon KAHLKE And RALPH estimation implemented in the amplicon analysis framework QIIME (Caporaso et al, 2010) which uses the three best blast hits of a read for classification of amplicon sequences. Despite its reliance on well curated target databases, LCA algorithms have been shown to be highly accurate (McIntyre et al, 2017) and, depending on the comparison tool used, computationally efficient. Current LCA implementations, however, are restricted to NGS reads or short sequences and lack the ability to classify sequences such as predicted aminoacid sequences, assembled contigs from genome and metagenome projects or the increasingly common long-read sequences produced by 3GS technologies.…”
mentioning
confidence: 99%