2007
DOI: 10.1371/journal.pgen.0030161
|View full text |Cite
|
Sign up to set email alerts
|

Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis

Abstract: It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
1,649
0
1

Year Published

2009
2009
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 1,696 publications
(1,698 citation statements)
references
References 38 publications
7
1,649
0
1
Order By: Relevance
“…In our primary analysis, we applied aggressive correction of potential confounders, controlling for 15–35 probabilistic estimation of expression residuals (PEER) factors 12 capturing 59–78% of total variance in gene expression levels (Supplementary Information 5). However, PEER and related approaches 30 may also remove variance in gene expression levels arising from regulatory pathways and broad trans effects. Indeed, several loci with numerous associations were found in uncorrected data, but disappeared after controlling for PEER factors (Supplementary Fig.…”
Section: Functional Characterization Of Trans-eqtlsmentioning
confidence: 99%
“…In our primary analysis, we applied aggressive correction of potential confounders, controlling for 15–35 probabilistic estimation of expression residuals (PEER) factors 12 capturing 59–78% of total variance in gene expression levels (Supplementary Information 5). However, PEER and related approaches 30 may also remove variance in gene expression levels arising from regulatory pathways and broad trans effects. Indeed, several loci with numerous associations were found in uncorrected data, but disappeared after controlling for PEER factors (Supplementary Fig.…”
Section: Functional Characterization Of Trans-eqtlsmentioning
confidence: 99%
“…Model (4.1), which leads to PCs as the MLE of the sufficient reduction, has a very general mean function, but its error structure is restrictive. Nevertheless, this error structure has recently been used in studies of gene expression (X) that are complicated by stratification and heterogeneity (Leek & Storey 2007). On the other hand, the usual linear model has a restrictive mean function, and under that model alone there seems to be no clear rationale for reduction by PCs (Cox 1968).…”
Section: Normal Inverse Modelsmentioning
confidence: 99%
“…For the purpose of applying this method to data sets that arise for example from gene expression studies via microarrays, there is much evidence that the data for different genes cannot be considered to be independent of each other, and correction methods such as surrogate variable analysis (SVA) (Leek & Storey, 2007) or its Partial Least Squares variant (SVA-PLS) (Chakraborty, Datta, Somnath, & Datta, Susmita, 2012) need to be applied to correct for this to allow much of this dependence to be accounted for in extra surrogate variables to be fitted. The p-values from the corrected tests for significance of the genes are almost (theoretically exactly) independent (Leek, & Storey, 2008).…”
Section: Discussionmentioning
confidence: 99%
“…Probability models for these patterns underlie the statistical analysis used in the search for such genes. Much recent theoretical work involves correction for effects not explicitly modelled that cause correlation among the data for individual tests (Leek & Storey, 2007;Leek & Storey, 2008;Lunceford et al, 2011;Chakraborty, Datta, Somnath, & Datta, Susmita, 2012), while the simpler problem of handling independent tests (Storey, 2007;Hwang & Liu 2010), does not seem to have been fully explored in its practical implementation when one or both hypotheses have unknown (hyper)parameters (Nixon, 2012). This may be because the perception of the need to deal with dependence makes such a study almost irrelevant.…”
Section: Introductionmentioning
confidence: 99%