2022
DOI: 10.1038/s41467-022-33071-9
|View full text |Cite
|
Sign up to set email alerts
|

Batch effects removal for microbiome data via conditional quantile regression

Abstract: Batch effects in microbiome data arise from differential processing of specimens and can lead to spurious findings and obscure true signals. Strategies designed for genomic data to mitigate batch effects usually fail to address the zero-inflated and over-dispersed microbiome data. Most strategies tailored for microbiome data are restricted to association testing or specialized study designs, failing to allow other analytic goals or general designs. Here, we develop the Conditional Quantile Regression (ConQuR) … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
26
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 49 publications
(32 citation statements)
references
References 46 publications
1
26
0
Order By: Relevance
“…As an alternative, non‐parametric multivariate approaches can be used, or more recent methods explicitly developed for considering the specificities of microbiome data (Goh et al., 2017; Wang & Lê Cao, 2020). The latter include, from oldest to most recent percentile normalisation (Gibbons et al., 2018), the Bayesian Dirichlet–multinomial regression meta‐analysis (BDMMA) (Dai et al., 2019), the conditional quantile regression (ConQuR) (Ling et al., 2022), the ‘adjust_batch’ tool from the MMUPHin package (Ma, 2022) and three approaches based on the PLS‐DA from the PLSDAbatch package (Wang & Lê Cao, 2023). The latter three, named PLSDA‐batch, sparse PLSDA‐batch (sPLSDA‐batch), and weighted PLSDA‐batch (wPLSDA‐batch), require prior abundance filtering and CLR transformation.…”
Section: Batch Effectsmentioning
confidence: 99%
See 2 more Smart Citations
“…As an alternative, non‐parametric multivariate approaches can be used, or more recent methods explicitly developed for considering the specificities of microbiome data (Goh et al., 2017; Wang & Lê Cao, 2020). The latter include, from oldest to most recent percentile normalisation (Gibbons et al., 2018), the Bayesian Dirichlet–multinomial regression meta‐analysis (BDMMA) (Dai et al., 2019), the conditional quantile regression (ConQuR) (Ling et al., 2022), the ‘adjust_batch’ tool from the MMUPHin package (Ma, 2022) and three approaches based on the PLS‐DA from the PLSDAbatch package (Wang & Lê Cao, 2023). The latter three, named PLSDA‐batch, sparse PLSDA‐batch (sPLSDA‐batch), and weighted PLSDA‐batch (wPLSDA‐batch), require prior abundance filtering and CLR transformation.…”
Section: Batch Effectsmentioning
confidence: 99%
“…Moreover, both approaches are only appropriate for a limited subset of differential abundance tests and do not provide batch‐normalised profiles (Ma et al, 2022). ConQuR requires comprehensive metadata to accurately estimate conditional distributions of read counts, which can lead to over‐optimism in association analysis and cannot work if the batch completely confounds the critical variable (Ling et al., 2022). In addition, MMUPHin (Ma, 2022) assumes the data to be zero‐inflated Gaussian, which is only suitable for certain transformations of relative abundance data (Ling et al., 2022), and the PLS‐DA‐based methods require pre‐defined batch group information, so, if unknown, it should be identified with PCA or any clustering approach (Wang & Lê Cao, 2023).…”
Section: Batch Effectsmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides aforementioned considerations in statistical power and model interpretability, another potential issue is that most existing compositional variable selection methods are based on mean regression models [10,12,15,16], which often fail to accommodate data heterogeneity, probably due to either heteroscedastic variance or other forms of non-location-scale covariate effects [17]. A common solution to this issue would be the quantile regression [18], which has been widely used in biomedical data analysis [19][20][21][22]. In the context of compositional data analysis, Ma and Zhang [23] proposed an adaptive Lasso-based method for compositional quantile regression, whose finite-sample selection accuracy has not yet been investigated.…”
Section: Introductionmentioning
confidence: 99%
“…BEs include any sources of unwanted biological, technical or computational variations that are unrelated to, but obscure, the biological element of interest (18). Although microbiomespeci c methods have been developed to remove BEs (19)(20)(21)(22)(23), their use is not yet widespread and, to the best of our knowledge, they have not been employed in any 16S rRNA gene sequencing research on salivary microbiota to date.…”
Section: Introductionmentioning
confidence: 99%