2017
DOI: 10.1101/202903
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Addressing confounding artifacts in reconstruction of gene co-expression networks

Abstract: Gene co-expression networks can capture biological relationships between genes, and are important tools in predicting gene function and understanding disease mechanism. We show that artifacts such as batch effects in gene expression data confound commonly used network reconstruction algorithms. We then demonstrate, both theoretically and empirically, that principal component correction of gene expression measurements prior to network inference can reduce false discoveries. Using expression data from the GTEx p… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
26
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(28 citation statements)
references
References 40 publications
1
26
1
Order By: Relevance
“…First, for each tissue, we only included genes for which the corresponding median expression (median(log 2 (RPKM + 1))) in that specific tissue was greater than zero. Then, prior to network construction, we preprocessed the expression data (on the log 2 (RPM/10 + 1) scale) to remove unwanted variation, which can confound the estimation of pairwise correlation coefficients between genes (Freytag et al 2015;Parsana et al 2017). Parsana et al (2017) established that this can be addressed by removing leading principal components from the expression matrix.…”
Section: Coexpression Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…First, for each tissue, we only included genes for which the corresponding median expression (median(log 2 (RPKM + 1))) in that specific tissue was greater than zero. Then, prior to network construction, we preprocessed the expression data (on the log 2 (RPM/10 + 1) scale) to remove unwanted variation, which can confound the estimation of pairwise correlation coefficients between genes (Freytag et al 2015;Parsana et al 2017). Parsana et al (2017) established that this can be addressed by removing leading principal components from the expression matrix.…”
Section: Coexpression Analysismentioning
confidence: 99%
“…Then, prior to network construction, we preprocessed the expression data (on the log 2 (RPM/10 + 1) scale) to remove unwanted variation, which can confound the estimation of pairwise correlation coefficients between genes (Freytag et al 2015;Parsana et al 2017). Parsana et al (2017) established that this can be addressed by removing leading principal components from the expression matrix. To avoid overfitting, we removed the same number of principal components from all tissues.…”
Section: Coexpression Analysismentioning
confidence: 99%
“…Almost universally, these approaches are only designed for comparisons between two groups, and none can accommodate continuous predictor variables. Further, these approaches are only designed for data sets in which no confounders or covariates may bias co-expression estimates, which could lead to false conclusions if not accounted for 12 . This limited flexibility has made it difficult to identify individualspecific factors that predict co-expression in humans or other organism that are not amenable to controlled manipulations.…”
Section: Introductionmentioning
confidence: 99%
“…First, for each tissue, we only included genes where the corresponding tissue-specific median expression (median(log 2 (RPKM + 1))) was greater than zero. Then, prior to network construction, we preprocessed the expression data to remove unwanted variation, since it is known that it can confound the estimation of pairwise correlation coefficients between genes (Freytag et al, 2015;Parsana et al, 2017). To achieve this, for each tissue, we standardized the expression matrix (containing (log 2 (RPM + 1)) values) to have mean 0 and variance 1 across every gene, and removed the 4 leading principal components from this matrix by regressing on the PCs and then reconstructing a new matrix with the regression residuals, using the function removePrincipalComponents() in the WGCNA package (Zhang and Horvath, 2005;Langfelder and Horvath, 2008).…”
Section: Co-expression Analysismentioning
confidence: 99%
“…To achieve this, for each tissue, we standardized the expression matrix (containing (log 2 (RPM + 1)) values) to have mean 0 and variance 1 across every gene, and removed the 4 leading principal components from this matrix by regressing on the PCs and then reconstructing a new matrix with the regression residuals, using the function removePrincipalComponents() in the WGCNA package (Zhang and Horvath, 2005;Langfelder and Horvath, 2008). This has been shown to remove unwanted variation for co-expression analysis (Parsana et al, 2017). In Supplementary Figure S9 we depict the impact of doing this on the distribution of pairwise correlations across (1) 2000 randomly selected genes and (2) 80 genes encoding for the protein component of the ribosome (Supplementary Table 7), following ideas from Freytag et al (2015).…”
Section: Co-expression Analysismentioning
confidence: 99%