2019
DOI: 10.1186/s12859-019-3108-7
|View full text |Cite
|
Sign up to set email alerts
|

DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark

Abstract: Background XHMM is a widely used tool for copy-number variant (CNV) discovery from whole exome sequencing data but can require hours to days to run for large cohorts. A more scalable implementation would reduce the need for specialized computational resources and enable increased exploration of the configuration parameter space to obtain the best possible results. Results DECA is a horizontally scalable implementation of the XHMM alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 18 publications
(12 reference statements)
0
4
0
Order By: Relevance
“…The table shows that apart from some tools that reports tests only on a multi-core workstation ( [16] , [17] , [18] , [19] ), Spark has been widely used to implement tools aimed at parallelizing the computation on a distributed computing environment. Most of these tools have been specifically devised for, or tested on, a cloud environment ( [20] , [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] [29] , [30] , [31] , [32] [33] , [34] , [35] , [36] , [37] ). Being the increasing availability of IaaS (Infrastructure as a Service) cloud computing services, it is desirable that the released tools are commonly designed to be supported also by such infrastructures.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
See 2 more Smart Citations
“…The table shows that apart from some tools that reports tests only on a multi-core workstation ( [16] , [17] , [18] , [19] ), Spark has been widely used to implement tools aimed at parallelizing the computation on a distributed computing environment. Most of these tools have been specifically devised for, or tested on, a cloud environment ( [20] , [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] [29] , [30] , [31] , [32] [33] , [34] , [35] , [36] , [37] ). Being the increasing availability of IaaS (Infrastructure as a Service) cloud computing services, it is desirable that the released tools are commonly designed to be supported also by such infrastructures.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…CMAN Ext. tools/frameworks Genomics genome assembly SORA [20] de novo genome assembly GraphX - - - - variant calling DECA [21] copy number variantion discovery MLlib - - - ADAM ADS-HCSpark [48] SNPs and indels calling - - - - - - SparkGA2 [22] variant calling - - - - SparkRA [49] GATK best-practices pipeline - - - - - - DeepVariant on Spark [23] SNPs and indels calling - - Apache Parquet VC@Scale [24] SNPs and indels calling - - - Apache Arrow Halvade Somatic [25] somatic variant calling - - - - - …”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
See 1 more Smart Citation
“…Rather unexpectedly, it has proven possible to use trio WES to identify de novo and inherited copy number variants (4,34,75). Intragenic or genic deletions or duplications account for approximately 2% of causative alleles in developmental disorders (37,57), but this figure may be under-ascertained due to technical issues related to short-read sequencing.…”
Section: Trio-based Whole-exome and Whole-genome Sequencingmentioning
confidence: 99%