Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies

Thorsen, Jonathan; Brejnrod, Asker; Mortensen, Martin Steen; Rasmussen, Morten Arendt; Stokholm, Jakob; Al‐Soud, Waleed Abu; Sørensen, Søren J.; Bisgaard, Hans; Waage, Johannes

doi:10.1186/s40168-016-0208-8

Cited by 154 publications

(181 citation statements)

References 38 publications

Supporting

Mentioning

160

Contrasting

Unclassified

Order By: Relevance

“…The similarities with respect to sparsity observed in both scRNA-seq and metagenomics data led us to pose the question of whether statistical methods developed for the differential expression of scRNA-seq data perform well on metagenomic DA analysis. Some benchmarking efforts have compared the performance of methods [9][10][11][12] both adapted from bulk RNA-seq and developed for microbiome DA 13,14 . While some tools exist to guide researchers 15 , a general consensus on the best approach is still missing, especially regarding the methods' capability of controlling false discoveries.…”

Section: Introductionmentioning

confidence: 99%

Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data

Calgaro

Romualdi

Risso

et al. 2020

Preprint

View full text Add to dashboard Cite

The correct identification of differentially abundant microbial taxa between experimental conditions is a methodological and computational challenge. Recent work has shown that commonly used methods do not control the false discovery rate due to the peculiarity of these data (e.g. high sparsity), leading to an abundance of false positive results.Since single-cell RNA-seq shares some of these peculiarities, we apply methods developed for single cell differential expression to microbiome data. We compare these approaches to methods developed for bulk RNA-seq and microbiome data, in terms of suitability of distributional assumptions, ability to control false discoveries, consistency, replicability, and power. We benchmark these methods using 100 manually curated datasets from 16S and whole metagenome shotgun sequencing. A simulation framework is developed to assess the impact of experimental design in power analysis.Our analyses suggest that DESeq2 and limma-voom show the best performance. We recommend a careful exploratory data analysis prior to application of any inferential model and we present a framework to help scientists make an informed choice of analysis methods in a dataset-specific manner.

show abstract

Section: Introductionmentioning

confidence: 99%

Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data

Calgaro

Romualdi

Risso

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…To quantify differences in proportions of features between two sampling groups [often referred to as ‘differential relative abundance testing’; Thorsen et al, , Weiss et al, ), posterior probability distributions (PPDs) for

π_{j, k = 1} - π_{j, k = 2}

(Figure d) can be obtained. Consistent with convention, if 95% of the samples of this PPD of differences are either greater or less than zero, then there is a high certainty of a nonzero effect of sampling group on feature relative abundance.…”

Section: Methodsmentioning

confidence: 99%

“…To quantify differences in proportions of features between two sampling groups [often referred to as 'differential relative abundance testing'; Thorsen et al, 2016, Weiss et al, 2017, posterior probability distributions (PPDs) for j,k=1 − j,k=2 (Figure 2d) can be obtained.…”

Section: Dirichlet Multinomial Modelling Approachmentioning

confidence: 99%

“…Such effects will go unnoticed if analyses rely on techniques such as ordination and PERMANOVA, which can provide insight into overall differences between sampling groups (McKnight et al, 2019), but provide no statistical model to identify those features that may differ in relative abundance among groups. Accordingly, a variety of methods have been developed to perform the seemingly simple task of determining treatment-induced shifts in relative abundance, which is often referred to as 'differential relative abundance testing' or 'differential expression' testing (the latter phrase arises because the roots of many of these methods lie within the field of functional genomics; Bullard, Purdom, Hansen, & Dudoit, 2010;Dillies et al, 2013;Paulson, Stine, Bravo, & Pop, 2013;Thorsen et al, 2016;Weiss et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Harrison

Calder

Shastry

et al. 2020

Molecular Ecology Resources

View full text Add to dashboard Cite

Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet‐multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

show abstract

“…Operational taxonomic units (OTUs) responding significantly across experimental design were extracted using previously described methodology ) using an analysis of deviance (AOD) after generalized linear modelling (GLM) of the raw counts using negative binomial distribution (nb) with 1000 resampling iterations with residual variance, using the package mvabund (nbGLM, likelihood ratio test, p < 0.05, Wang et al 2012). This method was recently suggested as one of the most accurate way to extract significantly responding OTUs by minimizing the risk of error (Thorsen et al 2016). A generalized heatmap of dominant (relative abundance >0.1%) and significantly responding OTUs was generated using previously described methodology .…”

Section: Dna Extractionmentioning

confidence: 99%

Effects of phosphorus-mobilizing bacteria on tomato growth and soil microbial activity

et al. 2017

View full text Add to dashboard Cite

Aims The aim of our study was to clarify whether inoculating a soil with Pseudomonas sp. RU47 (RU47) bacteria would stimulate the enzymatic cleavage of organic P compounds in the rhizosphere and bulk soil, promoting plant growth. Adding either viable or heat treated RU47 cells made it possible to separate direct from indirect effects of the inoculum on P cycling in soil and plants. Methods We performed a rhizobox experiment in the greenhouse with tomato plants (Solanum lycopersicum) under low P soil conditions. Three inoculation treatments were conducted, using unselectively grown soil bacteria (bacterial mix), heat treated (HT-RU47) and viable RU47 (RU47) cells, and one not inoculated, optimally P-fertilized treatment. We verified plant growth, nutrient availability, enzyme activities and microbial community structure in soil. Results A plant growth promotion effect with improved P uptake was observed in both RU47 treatments. Inoculations of RU47 cells increased microbial phosphatase activity (PA) in the rhizosphere. Conclusions Plant growth promotion by RU47 cells is primarily associated with increased microbial PA in soil, while promotion of indigenous Pseudomonads as well as phytohormonal effects appear to be the dominant mechanisms when adding HT-RU47 cells. Thus, using RU47 offers a promising approach for more efficient P fertilization in agriculture.

show abstract

Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies

Cited by 154 publications

References 38 publications

Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data

Assessment of statistical methods from single cell, bulk RNA-seq and metagenomics applied to microbiome data

Dirichlet‐multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Effects of phosphorus-mobilizing bacteria on tomato growth and soil microbial activity

Contact Info

Product

Resources

About