2011
DOI: 10.4137/ebo.s6618
|View full text |Cite
|
Sign up to set email alerts
|

SOLiDzipper: A High Speed Encoding Method for the Next-Generation Sequencing Data

Abstract: Background:Next-generation sequencing (NGS) methods pose computational challenges of handling large volumes of data. Although cloud computing offers a potential solution to these challenges, transferring a large data set across the internet is the biggest obstacle, which may be overcome by efficient encoding methods. When encoding is used to facilitate data transfer to the cloud, the time factor is equally as important as the encoding efficiency. Moreover, to take advantage of parallel processing in cloud comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…There has been considerable work on compression of sequencing data, with some researchers specializing only on sequence compression [6], [7] or quality value compression [8], [9] with others supporting full FASTQ file compression; G-SQZ [10], SlimGene [11], SOLiDzipper [12], DSRC [13], Quip [14], SCALCE [15] and KungFQ [16]. Related to this is work on SAM/BAM [17] compression including Goby (F. Campagne, http://campagnelab.org/software/goby/ accessed on July 19, 2012), CRAM [18], SAMZIP [19] and NGC [20].…”
Section: Introductionmentioning
confidence: 99%
“…There has been considerable work on compression of sequencing data, with some researchers specializing only on sequence compression [6], [7] or quality value compression [8], [9] with others supporting full FASTQ file compression; G-SQZ [10], SlimGene [11], SOLiDzipper [12], DSRC [13], Quip [14], SCALCE [15] and KungFQ [16]. Related to this is work on SAM/BAM [17] compression including Goby (F. Campagne, http://campagnelab.org/software/goby/ accessed on July 19, 2012), CRAM [18], SAMZIP [19] and NGC [20].…”
Section: Introductionmentioning
confidence: 99%
“…This behavior is typical for parallel algorithms computing very small datasets [17]. In the case of the 1TB dataset, we conclude that our moderate size cluster, although equipped with a high performance parallel file system, is not able to keep a linear speedup and hence a desired efficiency when 8 or more processing units are concurrently accessing non-contiguous regions of a big dataset 8 .…”
Section: A Performance Evaluationmentioning
confidence: 85%
“…Among all the datasets 10GB has the maximum throughput of 8,441MB/s (8.4GB/s) when compressed with 128 processes, which correlates with its superlinear speedup. 128 processing units were used to achieve the 8 This conclusion is based on results obtained comparing the reading times with those of reading contiguous regions of the 1TB datset…”
Section: A Performance Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…Algorithms such as Quip [2], G-SQZ [3], DSRC [4], KungFQ [5], SeqDB [6], LW-FQZip [7], LFQC [8], LEON [9], SCALCE [10] and SOLiDzipper [11] take into account the nature of DNA sequences and use different data compression techniques to achieve good ratios. But these algorithms process the data sequentially and do not fully utilize the processing powers of the newest computing resources, such as GPUs or CPU's multi-core technologies, to speed up the compression of an increasing amount of genomic data.…”
Section: Introductionmentioning
confidence: 99%