2018
DOI: 10.1186/s12859-018-2498-2
|View full text |Cite
|
Sign up to set email alerts
|

BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data

Abstract: BackgroundBisulfite sequencing is one of the major high-resolution DNA methylation measurement method. Due to the selective nucleotide conversion on unmethylated cytosines after treatment with sodium bisulfite, processing bisulfite-treated sequencing reads requires additional steps which need high computational demands. However, a dearth of efficient aligner that is designed for bisulfite-treated sequencing becomes a bottleneck of large-scale DNA methylome analyses.ResultsIn this study, we present a highly sca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 25 publications
(26 reference statements)
0
6
0
Order By: Relevance
“…For a small amount of data, BatMeth2 ( Zhou et al, 2019 ) is recommended because of its accuracy and map ability. On the other hand, BiSpark ( Soe et al, 2018 ) is better for a large amount of data. For researchers not good at programming, ViAliBS ( Li et al, 2017 ) is more user-friendly for its graphical user interface.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For a small amount of data, BatMeth2 ( Zhou et al, 2019 ) is recommended because of its accuracy and map ability. On the other hand, BiSpark ( Soe et al, 2018 ) is better for a large amount of data. For researchers not good at programming, ViAliBS ( Li et al, 2017 ) is more user-friendly for its graphical user interface.…”
Section: Discussionmentioning
confidence: 99%
“…Since alignment is computationally heavy, a natural way to improve efficiency is to compute in parallel. BiSpark ( Soe et al, 2018 ) used Spark engine to execute the three-letter alignment parallelly on the distributed system with load balance. It only took 1/3 to half the time of Bismark according to their results.…”
Section: Methodsmentioning
confidence: 99%
“…The Table 1 also highlights that Spark is also used with other frameworks. In particular, it is often used in conjunction with Hadoop to take advantange of its file system (i.e., HDFS) ( [16] , [22] , [23] , [26] , [27] , [30] , [31] , [34] , [35] , [38] , [39] , [40] , [41] [42] ) and of its cluster manager (i.e., YARN) ( [30] , [31] , [43] ).…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…A highly scalable bisulfite aligner implemented on Spark (called BiSpark), devised to deal with the mapping of reads treated with bisulfite is proposed in [39] . Without going into unnecessary details, a common strategy to map bisulfite treated reads is based on a 3-letter nucleotide alphabet reduction algorithm.…”
Section: Apache Spark In Life Sciencesmentioning
confidence: 99%
“…Trimmomatic [16], TrimGalore (https://github.com/FelixKrueger/ TrimGalore) , Cutadapt [17]), alignment of reads to a reference genome and generation of methylation calls (e.g. BSseeker2 [18], BSseeker3 [19], Bismark [20], BSMap [21], bwa-meth (https://github.com/brentp/bwameth/) , BRAT-nova [22], BiSpark [23], WALT [24], segemehl [25]). From a computational standpoint, data pre-processing is by far the most time-consuming step in the entire bulk or single-cell WGBS analysis workflow (Fig.1).…”
Section: Introductionmentioning
confidence: 99%