2017
DOI: 10.1186/s12859-017-1847-x
|View full text |Cite
|
Sign up to set email alerts
|

Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data

Abstract: BackgroundAlthough ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
30
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1

Relationship

3
5

Authors

Journals

citations
Cited by 47 publications
(30 citation statements)
references
References 28 publications
(23 reference statements)
0
30
0
Order By: Relevance
“…We began by downloading GTEx RNA-seq data ( Consortium, 2015 ). After filtering and quality control, these RNA-seq data included expression information for 30,243 genes measured across 9,435 samples and 38 distinct tissues ( Paulson et al, 2017 ). For each tissue, we used PANDA to integrate gene-gene co-expression information from these data with an initial regulatory network of 644 transcription factors ( Weirauch et al, 2014 ) and transcription factor protein-protein interactions ( Szklarczyk et al, 2015 ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We began by downloading GTEx RNA-seq data ( Consortium, 2015 ). After filtering and quality control, these RNA-seq data included expression information for 30,243 genes measured across 9,435 samples and 38 distinct tissues ( Paulson et al, 2017 ). For each tissue, we used PANDA to integrate gene-gene co-expression information from these data with an initial regulatory network of 644 transcription factors ( Weirauch et al, 2014 ) and transcription factor protein-protein interactions ( Szklarczyk et al, 2015 ).…”
Section: Resultsmentioning
confidence: 99%
“…We used YARN ( https://bioconductor.org/packages/release/bioc/html/yarn.html ) to perform quality control, gene filtering, and normalization preprocessing on the GTEx RNA-seq data, as described in ( Paulson et al, 2017 ). Briefly, this pipeline tested for sample sex-misidentification, merged related sub-tissues, performed tissue-aware normalization using qsmooth ( Hicks et al, 2017 ), and resulted in a dataset of 9,435 gene expression profiles assaying 30,333 genes in 38 tissues from 549 individuals.…”
Section: Methodsmentioning
confidence: 99%
“…In this study, we have shown that while library size normalization is inadequate, TMM [20] applied to pseudo counts and quantile normalization [27] both work well for normalizing between bulk and single-cell data. T and B cells are both immune cell types; for more diverse cell types, more advanced normalization methods such as smooth quantile normalization [33], implemented for example in YARN [34], may be a useful approach since it can handle differences in gene expression distribution across different types of samples. We also show that ComBat [24] effectively removes technical batch effects.…”
Section: Discussionmentioning
confidence: 99%
“…We used the yarn package 33 to preprocess RNA-Seq data from GTEx release version 6.0 15 as yarn offers tissue-specific filtering and normalisation. This tissue-aware preprocessing ensures we do not filter out genes that show highly specific tissue expression.…”
Section: Methodsmentioning
confidence: 99%