BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data

Soe, S. S. E.; Park, Yoonjae; Chae, Han‐Jung

doi:10.1186/s12859-018-2498-2

Cited by 6 publications

(6 citation statements)

References 25 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For a small amount of data, BatMeth2 ( Zhou et al, 2019 ) is recommended because of its accuracy and map ability. On the other hand, BiSpark ( Soe et al, 2018 ) is better for a large amount of data. For researchers not good at programming, ViAliBS ( Li et al, 2017 ) is more user-friendly for its graphical user interface.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating the Consistency of Gene Methylation in Liver Cancer Using Bisulfite Sequencing Data

Zheng

et al. 2021

Front. Cell Dev. Biol.

View full text Add to dashboard Cite

Bisulfite sequencing is considered as the gold standard approach for measuring DNA methylation, which acts as a pivotal part in regulating a variety of biological processes without changes in DNA sequences. In this study, we introduced the most prevalent methods for processing bisulfite sequencing data and evaluated the consistency of the data acquired from different measurements in liver cancer. Firstly, we introduced three commonly used bisulfite sequencing assays, i.e., reduced-representation bisulfite sequencing (RRBS), whole-genome bisulfite sequencing (WGBS), and targeted bisulfite sequencing (targeted BS). Next, we discussed the principles and compared different methods for alignment, quality assessment, methylation level scoring, and differentially methylated region identification. After that, we screened differential methylated genes in liver cancer through the three bisulfite sequencing assays and evaluated the consistency of their results. Ultimately, we compared bisulfite sequencing to 450 k beadchip and assessed the statistical similarity and functional association of differentially methylated genes (DMGs) among the four assays. Our results demonstrated that the DMGs measured by WGBS, RRBS, targeted BS and 450 k beadchip are consistently hypo-methylated in liver cancer with high functional similarity.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Since alignment is computationally heavy, a natural way to improve efficiency is to compute in parallel. BiSpark ( Soe et al, 2018 ) used Spark engine to execute the three-letter alignment parallelly on the distributed system with load balance. It only took 1/3 to half the time of Bismark according to their results.…”

Section: Methodsmentioning

confidence: 99%

Evaluating the Consistency of Gene Methylation in Liver Cancer Using Bisulfite Sequencing Data

Zheng

et al. 2021

Front. Cell Dev. Biol.

View full text Add to dashboard Cite

show abstract

“…The Table 1 also highlights that Spark is also used with other frameworks. In particular, it is often used in conjunction with Hadoop to take advantange of its file system (i.e., HDFS) ( [16] , [22] , [23] , [26] , [27] , [30] , [31] , [34] , [35] , [38] , [39] , [40] , [41] [42] ) and of its cluster manager (i.e., YARN) ( [30] , [31] , [43] ).…”

Section: Apache Spark In Life Sciencesmentioning

confidence: 99%

“…A highly scalable bisulfite aligner implemented on Spark (called BiSpark), devised to deal with the mapping of reads treated with bisulfite is proposed in [39] . Without going into unnecessary details, a common strategy to map bisulfite treated reads is based on a 3-letter nucleotide alphabet reduction algorithm.…”

Section: Apache Spark In Life Sciencesmentioning

confidence: 99%

Framing Apache Spark in life sciences

Manconi¹,

Gnocchi²,

Milanesi³

et al. 2023

Heliyon

View full text Add to dashboard Cite

“…Trimmomatic [16], TrimGalore (https://github.com/FelixKrueger/ TrimGalore) , Cutadapt [17]), alignment of reads to a reference genome and generation of methylation calls (e.g. BSseeker2 [18], BSseeker3 [19], Bismark [20], BSMap [21], bwa-meth (https://github.com/brentp/bwameth/) , BRAT-nova [22], BiSpark [23], WALT [24], segemehl [25]). From a computational standpoint, data pre-processing is by far the most time-consuming step in the entire bulk or single-cell WGBS analysis workflow (Fig.1).…”

Section: Introductionmentioning

confidence: 99%

MethylStar: A fast and robust pre-processing pipeline for bulk or single-cell whole-genome bisulfite sequencing data

2020

View full text Add to dashboard Cite

Background: Whole-Genome Bisulfite Sequencing (WGBS) is a Next Generation Sequencing (NGS) technique for measuring DNA methylation at base resolution. Continuing drops in sequencing costs are beginning to enable high-throughput surveys of DNA methylation in large samples of individuals and/or single cells. These surveys can easily generate hundreds or even thousands of WGBS datasets in a single study. The efficient pre-processing of these large amounts of data poses major computational challenges and creates unnecessary bottlenecks for downstream analysis and biological interpretation. Results: To offer an efficient analysis solution, we present MethylStar, a fast, stable and flexible pre-processing pipeline for WGBS data. MethylStar integrates well-established tools for read trimming, alignment and methylation state calling in a highly parallelized environment, manages computational resources and performs automatic error detection. MethylStar offers easy installation through a dockerized container with all preloaded dependencies and also features a user-friendly interface designed for experts/non-experts. Application of MethylStar to WGBS from Human, Maize and A. thaliana shows favorable performance in terms of speed and memory requirements compared with existing pipelines. Conclusions: MethylStar is a fast, stable and flexible pipeline for high-throughput pre-processing of bulk or single-cell WGBS data. Its easy installation and user-friendly interface should make it a useful resource for the wider epigenomics community. MethylStar is distributed under GPL-3.

show abstract

BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data

Cited by 6 publications

References 25 publications

Evaluating the Consistency of Gene Methylation in Liver Cancer Using Bisulfite Sequencing Data

Evaluating the Consistency of Gene Methylation in Liver Cancer Using Bisulfite Sequencing Data

Framing Apache Spark in life sciences

MethylStar: A fast and robust pre-processing pipeline for bulk or single-cell whole-genome bisulfite sequencing data

Contact Info

Product

Resources

About