2021
DOI: 10.1038/s41592-021-01141-3
|View full text |Cite
|
Sign up to set email alerts
|

Challenges in benchmarking metagenomic profilers

Abstract: Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classifying metagenomic data and estimating taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Yet, benchmarking metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here, we show how misleading conclusions can be drawn by neglecting this distinction b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
104
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 74 publications
(109 citation statements)
references
References 39 publications
2
104
1
Order By: Relevance
“…A growing body of Arctic marine microbiology research is characterizing microbial communities using data from either amplicon-based or shotgun metagenomic sequencing [11,[13][14][15]42], and the latter is also used for functional profiling. For the taxonomic assignment of shotgun metagenomic data, numerous classifiers and reference databases are available that fall into several categories: (i) DNA-to-DNA methods, where perfect matches between sequence stretches and reference sequences (k-mers) are sought (e.g., Kraken2, Bracken, and PathSeq); (ii) DNA-to-protein methods, where sequence reads are compared with protein-coding sequences (e.g., Kaiju and DIAMOND); and (iii) DNA-tomarker methods, including only specific marker gene families in reference databases (e.g., MetaPhlAn2) [43,44]. However, it has been suggested that the classifier performance and ecological truthfulness and representativeness of the results may vary according to the sample type, taxa present, and composition of the reference database used [43].…”
Section: The Effect Of Taxonomic Classification Methods On the Estimation Of Community Composition In Arctic Seawater-derived Bacterial Cmentioning
confidence: 99%
See 1 more Smart Citation
“…A growing body of Arctic marine microbiology research is characterizing microbial communities using data from either amplicon-based or shotgun metagenomic sequencing [11,[13][14][15]42], and the latter is also used for functional profiling. For the taxonomic assignment of shotgun metagenomic data, numerous classifiers and reference databases are available that fall into several categories: (i) DNA-to-DNA methods, where perfect matches between sequence stretches and reference sequences (k-mers) are sought (e.g., Kraken2, Bracken, and PathSeq); (ii) DNA-to-protein methods, where sequence reads are compared with protein-coding sequences (e.g., Kaiju and DIAMOND); and (iii) DNA-tomarker methods, including only specific marker gene families in reference databases (e.g., MetaPhlAn2) [43,44]. However, it has been suggested that the classifier performance and ecological truthfulness and representativeness of the results may vary according to the sample type, taxa present, and composition of the reference database used [43].…”
Section: The Effect Of Taxonomic Classification Methods On the Estimation Of Community Composition In Arctic Seawater-derived Bacterial Cmentioning
confidence: 99%
“…These taxa are represented by disproportionally high numbers of closely related reference sequences in databases (especially Pseudomonas in Standard Kraken2 and MAR DB and Cycloclasticus in MAR DB ), leading to oversampling and the decreased accuracy of classifiers [48]. The difference in Pseudomonas proportions was probably also affected by the tendency of Kraken2 and Kaiju to overestimate the proportions of microbes with larger genome sizes and higher polyploidy [44]. It is also highly probable that some sequences from the metagenomic data of this experiment were misclassified into these dominant genera; this has been previously noted to be a problem for Pseudomonas classifications in Kraken2 [47].…”
Section: The Effect Of Taxonomic Classification Methods On the Estimation Of Community Composition In Arctic Seawater-derived Bacterial Cmentioning
confidence: 99%
“…It could be insightful for deciphering the microbial processes when different factors, like diets, drugs, or immune factors, are applied to explore whether the growth patterns of human gut bacteria will change responsively. Furthermore, the inconsistent relative abundance of the same bacterial species identified from different metagenomic profilers (DNA-to-marker methods or DNA-to-DNA methods), 24 endorsed the value of our FISH-based abundance assessment of bacteria in the gut microbiota. Of note, in this HMA model, a new method that we recently developed (MeDabLISH) 25 to measure bacterial growth rates in vivo using their FDAA labeling and FISH staining, can also be applied.…”
Section: Discussionmentioning
confidence: 99%
“…Bioinformatic analysis also turned out to be non-trivial since the completeness and accuracy of the ever-growing sequence databases and different approaches of taxonomic methods have demonstrated to have an important effect on results [3,27,28]. Thus, careful interpretation and constant benchmarking of analysis methods and databases will be key for taxonomic classi cation and metagenomic applications success [29].…”
Section: Discussionmentioning
confidence: 99%