2019
DOI: 10.1093/gigascience/giz037
|View full text |Cite
|
Sign up to set email alerts
|

GenPipes: an open-source framework for distributed and scalable genomic analyses

Abstract: Background With the decreasing cost of sequencing and the rapid developments in genomics technologies and protocols, the need for validated bioinformatics software that enables efficient large-scale data processing is growing. Findings Here we present GenPipes, a flexible Python-based framework that facilitates the development and deployment of multi-step workflows optimized for high-performance computing clusters and the cloud. GenPipes already implements 12 validated … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
118
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 147 publications
(122 citation statements)
references
References 70 publications
0
118
0
Order By: Relevance
“…DRIP-seq analysis. FastQ files of DRIP-seq reads were trimmed with Trimmomatic (PE -phred33), using the GenPpipes ChIP-seq pipeline (steps 1-3) 64 . Reads with both mate pairs were aligned to the dm3 version of the Drosophila genome using Bowtie2/2.3.1(-fr -no-mixed -no-unal) 65 .…”
Section: Discussionmentioning
confidence: 99%
“…DRIP-seq analysis. FastQ files of DRIP-seq reads were trimmed with Trimmomatic (PE -phred33), using the GenPpipes ChIP-seq pipeline (steps 1-3) 64 . Reads with both mate pairs were aligned to the dm3 version of the Drosophila genome using Bowtie2/2.3.1(-fr -no-mixed -no-unal) 65 .…”
Section: Discussionmentioning
confidence: 99%
“…Standard ChIP-seq analysis relies on aligning reads to a reference sequence followed by peak calling [1] [2]. While the reference genome is a good approximation of the sequence under study, it does not account for the millions of small genetic variants, the larger structural variants or the two haplotypes of the human genome [3].…”
Section: Introductionmentioning
confidence: 99%
“…Workflow management systems together with Linux containers offer a solution to efficiently analyze large scale datasets in a highly reproducible, scalable and parallelizable manner. During the last decade, an increasing interest in the field has led to the development of different programs such as Snakemake (Köster andRahmann, 2012), NextFlow (Di Tommaso et al, 2017), Galaxy (Afgan et al, 2018), SciPipe (Lampa et al, 2019) or GenPipes (Bourgey et al, 2019), among others. These tools enable the prototyping and deployment of pipelines by abstracting computational processes and representing pipelines as directed graphs, in which nodes represent tasks to be executed and edges represent either data flow or execution dependencies between different tasks.…”
Section: Overview Of the Masterofpores Workflowmentioning
confidence: 99%