2023
DOI: 10.1093/bib/bbac622
|View full text |Cite
|
Sign up to set email alerts
|

PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data

Abstract: Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(30 citation statements)
references
References 68 publications
(78 reference statements)
0
22
0
Order By: Relevance
“…Researchers should try to ensure that factors such as age/sex/genetics of their samples, sampling location/ time, kit type/processing time, etc. (Wang and Lê Cao, 2020) do not overlap to large degrees with the actual biological variation they are testing in their experiments. If, however, such conflation is unavoidable due to the nature of the study system, there are a number of post hoc statistical computational methods that have been developed for dealing with such batch effects, specifically for microbiome data (Gibbons et al, 2018;Ma et al, 2020;Wang and Lê Cao, 2020).…”
Section: Batch Effectsmentioning
confidence: 99%
“…Researchers should try to ensure that factors such as age/sex/genetics of their samples, sampling location/ time, kit type/processing time, etc. (Wang and Lê Cao, 2020) do not overlap to large degrees with the actual biological variation they are testing in their experiments. If, however, such conflation is unavoidable due to the nature of the study system, there are a number of post hoc statistical computational methods that have been developed for dealing with such batch effects, specifically for microbiome data (Gibbons et al, 2018;Ma et al, 2020;Wang and Lê Cao, 2020).…”
Section: Batch Effectsmentioning
confidence: 99%
“…We evaluated the difference in variability across samples by gene set size with principal component analysis (PCA) with the mixOmics R package (version 6.18.1) pca() function to determine principal components, 23 and then used the PLSDAbatch package (version 0.2.1) Scatter_Density() function to generate PCA and density rugplots to investigate the variance between sample origin types (i.e., cell line, PDX, non-disease tissue, tumor tissue; between brain-derived tissues and GBM-derived models in Figure 2B; between all samples of all sources in Supplemental Figure 1A), as well as with respect to other variables that may impact variance (i.e., read length, tissue type, sample source, sex; Supplemental Figure 1B-E). 24 We identified tumor purity-correlated genes based on TCGA tumor gene expression significantly correlated with ABSOLUTE-generated tumor purity. We also defined tissue-specific gene expression from GTEx using the Harminozome (accessed November 2022).…”
Section: Gene Subsetsmentioning
confidence: 99%
“…We evaluated the difference in variability across samples by gene set size with principal component analysis (PCA) with the mixOmics R package (version 6.18.1) pca() function to determine principal components, 24 and then used the PLSDAbatch package (version 0.2.1) Scatter_Density() function to generate PCA and density rugplots to investigate the variance between sample origin types (i.e., cell line, PDX, non-disease tissue, tumor tissue; between brainderived tissues and GBM-derived models in Figure 2B; between all samples of all sources in Figure S1A), as well as with respect to other variables that may impact variance (i.e., read length, tissue type, sample source, sex; Figure S1B-E). 25 We identified tumor purity-correlated genes based on TCGA tumor gene expression significantly correlated with ABSOLUTEgenerated tumor purity. We also defined tissue-specific gene expression from GTEx using the Harminozome (accessed November 2022).…”
Section: Gene Subsetsmentioning
confidence: 99%