2015
DOI: 10.1101/019067
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rail-RNA: Scalable analysis of RNA-seq splicing and coverage

Abstract: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it is di cult to reproduce the exact analysis without access to original computing resources. We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more e cient as samples… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 29 publications
(31 citation statements)
references
References 54 publications
(77 reference statements)
0
31
0
Order By: Relevance
“…GENCODE version 28 annotations (39) were downloaded and parsed to collect full coordinates and left and right splice sites of junctions from annotated transcripts. The TCGA phenotype file from Rail-RNA (40) was parsed to collect sample type (primary, recurrent, or metastatic tumor vs. matched normal). A new SQLite3 database was created to index all GTEx and TCGA junctions, with linked tables containing 1) sample ids and associated junction ids; 2) sample ids and phenotype information for each sample; and 3) junction ids and junction information including GENCODE annotation status and location within protein coding gene boundaries.…”
Section: Rna Variant Identificationmentioning
confidence: 99%
“…GENCODE version 28 annotations (39) were downloaded and parsed to collect full coordinates and left and right splice sites of junctions from annotated transcripts. The TCGA phenotype file from Rail-RNA (40) was parsed to collect sample type (primary, recurrent, or metastatic tumor vs. matched normal). A new SQLite3 database was created to index all GTEx and TCGA junctions, with linked tables containing 1) sample ids and associated junction ids; 2) sample ids and phenotype information for each sample; and 3) junction ids and junction information including GENCODE annotation status and location within protein coding gene boundaries.…”
Section: Rna Variant Identificationmentioning
confidence: 99%
“…We simulated reads from 18,303 transcripts using all annotated features on chromosomes 1 and 14, including 10 scenarios of varying readlength and paired-end status, using the polyester R package [14]. Our method was run on the reduced-representation output from applying the aligner Rail-RNA [5] to the simulated FASTA files. All other methods extracted information from the full simulated FASTA files.…”
Section: Performance On Dirichlet-negative Binomial Simulated Datamentioning
confidence: 99%
“…For Cufflinks [6], we provided the fragment length ditsribution, and used --total-hits-norm --no-effective-length-correction --no-length-correction options. For our linear model, we utilized Rail-RNA [5] to process the FASTA files in the same manner as in recount2 [4]. For evaluation, each method's abundance estimates (est) were compared to the true number (truth) using mean absolute error (MAE):…”
Section: Dirichlet-negative Binomial Simulation Scenariosmentioning
confidence: 99%
See 1 more Smart Citation
“…to calculate the percentage of genome with at least 30X read depth) or to identify genomic regions overlapped by insucient number of reads for reliable variant calling [4]. Finally, depth of coverage is one of the most computationally intensive parts of di erential expression analysis using RNA-seq data at single-base resolution [5,6,7].…”
Section: Introductionmentioning
confidence: 99%