2016
DOI: 10.1007/s11227-016-1753-4
|View full text |Cite
|
Sign up to set email alerts
|

Performance comparison of sequential and parallel compression applications for DNA raw data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(11 citation statements)
references
References 47 publications
0
10
0
Order By: Relevance
“…39 Several lossless non-referential compressors for FASTQ have been released since then 16,23,27–35,3941 and most of them have been reviewed and tested in detail. 9,17,25,42 In our previous tests, top performers achieved compression ratios in the range between 4:1 and 8:1, which is still below from what a referential compressor could theoretically achieve. Also, restrictions related to input data features (file size, read size, technology of the sequencing machine), excessive runtime or low compression ratios, have limited the usage and effectiveness of non-referential compressors.…”
Section: Related Workmentioning
confidence: 66%
See 1 more Smart Citation
“…39 Several lossless non-referential compressors for FASTQ have been released since then 16,23,27–35,3941 and most of them have been reviewed and tested in detail. 9,17,25,42 In our previous tests, top performers achieved compression ratios in the range between 4:1 and 8:1, which is still below from what a referential compressor could theoretically achieve. Also, restrictions related to input data features (file size, read size, technology of the sequencing machine), excessive runtime or low compression ratios, have limited the usage and effectiveness of non-referential compressors.…”
Section: Related Workmentioning
confidence: 66%
“…These types of tools have seen widespread use for compressing biological sequences 22,23 due to their compatibility, robustness, and ease of use, in spite of certain performance limitations. 24,25…”
Section: Introductionmentioning
confidence: 99%
“…Evaluating the effectiveness of high-throughput sequencing data compression tools has gained a lot of interest in the last few years [1, 1315]. Comparative reviews of prominent general-purpose as well as DNA-specific compression algorithms show that DNA compression algorithms tend to compress DNA sequences much better than general-purpose compression algorithms [1, 4].…”
Section: Discussionmentioning
confidence: 99%
“…read identifier and read sequence) are compressed using MFCompress after the identifier stream is pre-processed to comply with the format restrictions of MFCompress. The third stream is discarded during compression as it contains a ’+’ symbol followed by an optional comment similar to identifier field which can be regenerated later at the time of decompression [13]. This is similar to all available tools including those used for comparison in this study.…”
Section: Methodsmentioning
confidence: 99%
“…In this case, quality scores usually account for more than half of the file [19], and it is no doubt noisier than the DNA sequence alone. The complexity of FASTQ [20] makes it harder to analyze or process; this will also be more thoroughly discussed in Section 3.2.…”
Section: Dna Data Compositionmentioning
confidence: 99%