2016
DOI: 10.1101/062497
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rapid and efficient analysis of 20,000 RNA-seq samples with Toil

Abstract: Toil is portable, open-source workflow software that supports contemporary workflow definition languages and can be used to securely and reproducibly run scientific workflows efficiently at large-scale. To demonstrate Toil, we processed over 20,000 RNA-seq samples to create a consistent meta-analysis of five datasets free of computational batch effects that we make freely available. Nearly all the samples were analysed in under four days using a commercial cloud cluster of 32,000 preemptable cores. Figure 1. (… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 31 publications
(28 citation statements)
references
References 16 publications
0
28
0
Order By: Relevance
“…The TARGET neuroblastoma, other pediatric tumor RNA sequencing data and GTEx RNA sequencing data was downloaded from S3 buckets (Amazon; s3://cgl-rnaseq-recompute-fixed/target/ and s3://cgl-rnaseq-recompute-fixed/gtex/) on 8/5/2016 from prior processed data as described in detail below from the UCSC Computational Genomics Laboratory (Vivian et al, 2016). …”
Section: Methods Detailsmentioning
confidence: 99%
“…The TARGET neuroblastoma, other pediatric tumor RNA sequencing data and GTEx RNA sequencing data was downloaded from S3 buckets (Amazon; s3://cgl-rnaseq-recompute-fixed/target/ and s3://cgl-rnaseq-recompute-fixed/gtex/) on 8/5/2016 from prior processed data as described in detail below from the UCSC Computational Genomics Laboratory (Vivian et al, 2016). …”
Section: Methods Detailsmentioning
confidence: 99%
“…Docker containers that are coded in CWL and WDL facilitate scalable, efficient, and reproducible deployment of tools across platforms including cloud environments. In addition, the BD2K Genomics Center has developed Toil [36] (Table 2, Standards category), and the BDDS center has developed Globus Genomics [37] (Table 2, Computing Platform category). Similar to Cromwell, Nextflow, and Arvados (Table 2, Computing Platform category), the aim of Toil and Globus Genomics is to make it easier for users to run large-scale analyses.…”
Section: Software and Systemsmentioning
confidence: 99%
“…It is important to note that the “parent-child” terminology is also applied to relations between individual workflow nodes by the Toil project, an executor which can also interpret Common Workflow Language. 10 However, Rabix uses these terms to refer to computational "jobs" and "subjobs", e.g. a “nested” workflow node is a child of a workflow and can be decomposed into an array of “subjobs”.…”
Section: Abstract Representation Of Data Analysis Workflows In Rabixmentioning
confidence: 99%
“…As compared to other CWL execution models 2,10 , computational events are triggered by “port” events instead of “job” events. In other words, when a port is evaluated, this triggers the executor to scan or update these tables in the following order: Variables, Jobs, Links.…”
Section: Optimization Of Cwl Workflows Via Dag Transformationsmentioning
confidence: 99%
See 1 more Smart Citation