2016
DOI: 10.1101/063552
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A cloud-based workflow to quantify transcript-expression levels in public cancer compendia

Abstract: Public compendia of raw sequencing data are now measured in petabytes. Accordingly, it is becoming infeasible for individual researchers to transfer these data to local computers. Recently, the National Cancer Institute funded an initiative to explore opportunities and challenges of working with molecular data in cloud-computing environments. With data in the cloud, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale compu… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
36
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(36 citation statements)
references
References 54 publications
(41 reference statements)
0
36
0
Order By: Relevance
“…Third, neoepitope burden was calculated for each patient weighted by TCGA transcript expression of the transcript(s) of origin for each neoepitope. We identified expressed transcripts in matched TCGA cancer types for each disease type in our cohort (SKCM for melanoma, LUAD/LUSC for NSCLC, COAD for colon cancer, UCEC for endometrial cancer, THCA for thyroid cancer, PRAD for prostate cancer, and KIRC for RCC) from TPM values generated by the National Cancer Institute (52) . A transcript was considered "expressed" for a cancer type if the 75th quantile TPM value for that transcript in that disease was greater than 1 TPM.…”
Section: Modified Neoepitope Burdenmentioning
confidence: 99%
“…Third, neoepitope burden was calculated for each patient weighted by TCGA transcript expression of the transcript(s) of origin for each neoepitope. We identified expressed transcripts in matched TCGA cancer types for each disease type in our cohort (SKCM for melanoma, LUAD/LUSC for NSCLC, COAD for colon cancer, UCEC for endometrial cancer, THCA for thyroid cancer, PRAD for prostate cancer, and KIRC for RCC) from TPM values generated by the National Cancer Institute (52) . A transcript was considered "expressed" for a cancer type if the 75th quantile TPM value for that transcript in that disease was greater than 1 TPM.…”
Section: Modified Neoepitope Burdenmentioning
confidence: 99%
“…The TCGA transcript-expression level profiles (TPM and count values) of ccRCC and matched normal kidney samples was downloaded from https://osf.io/gqrz9 (Tatlow and Piccolo, 2016) on November 27, 2018, which was quantified by Kallisto (Bray et al, 2016) based on the GENCODE reference transcriptome (version 24). The clinical information of TCGA samples was downloaded through R package TCGAbiolinks (Colaprico et al, 2016).…”
Section: Data and Preprocessingmentioning
confidence: 99%
“…Although the systematic analysis of lncRNAs function is being addressed by the FANTOM consortium in loss of function studies, increasing the detection rate of these transcripts combining different studies is difficult because the heterogeneity of analytic methods employed. Current resources that apply uniform analytic methods to create expression summaries from public data do exist but can miss several lncRNAs because their dependency on a pre-existing gene annotation for creating the genes expression summaries 5,6 . We recently created recount2 7 , a collection of uniformly-processed human RNA-seq data, wherein we summarized 4.4 trillion reads from over 70,000 human samples from the Sequence Reads Archive (SRA), The Cancer Genome Atlas (TCGA) 8 , and the Genotype-Tissue Expression (GTEx) 9 projects 7 .…”
Section: Introductionmentioning
confidence: 99%