2022
DOI: 10.1038/s41587-022-01440-w
|View full text |Cite
|
Sign up to set email alerts
|

Removing unwanted variation from large-scale RNA sequencing data with PRPS

Abstract: Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes an… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 29 publications
(23 citation statements)
references
References 68 publications
0
20
0
Order By: Relevance
“…The general use of the term “batch effect” has also included, sometimes explicitly (25) but often implicitly (2628), biological variations that are deemed ignorable within the scope of a study. For example, consider a hypothetical study where one cannot control the time of day of sample collection.…”
Section: Resultsmentioning
confidence: 99%
“…The general use of the term “batch effect” has also included, sometimes explicitly (25) but often implicitly (2628), biological variations that are deemed ignorable within the scope of a study. For example, consider a hypothetical study where one cannot control the time of day of sample collection.…”
Section: Resultsmentioning
confidence: 99%
“…The standard method involves within-sample normalization using positive control reporters and housekeeping genes, as well as the removal of endogenous genes found to be at or below the level of ‘noise’, which is calculated using the observed values of the negative control genes ( Waggott et al , 2012 ). Alternatively, the Removing Unwanted Variation-III (RUV-III) method has been demonstrated to provide improved performance for datasets including technical replicates—particularly when those technical replicates span multiple batches—and it can also be used for normalization without true replicates by generating pseudoreplicates from pseudosamples ( Molania et al , 2019 , 2022 ). Finally, the RUVg method has been demonstrated on multiple NanoString datasets to provide improved normalization performance using housekeeping genes ( Bhattacharya et al , 2020 ; Risso et al , 2014 ).…”
Section: Methods and Featuresmentioning
confidence: 99%
“…These undesirable variations fall into four categories: variabilities from notorious batch effects (batch variation) 10 , sequencing platforms (platform variation) 11, 12 , heterogeneous bio-samples (purity variation) 13, 14 and other unknown technical differences (unknown variation), substantially compromising downstream discoveries. Batch variations arising in different runs at different time points represent the prevailing technical factors 15 .…”
Section: Introductionmentioning
confidence: 99%