2017
DOI: 10.1093/bioinformatics/btx639
|View full text |Cite
|
Sign up to set email alerts
|

Compression of genomic sequencing reads via hash-based reordering: algorithm and analysis

Abstract: Supplementary material are available at Bioinformatics online. The proposed algorithm is available for download at https://github.com/shubhamchandak94/HARC.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
25
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(27 citation statements)
references
References 17 publications
0
25
0
Order By: Relevance
“…For the evaluation we used the state-of-the-art competitors, i.e., FaStore 12 , Spring 14 , and Minicom 15 . We resigned from testing some other good compressors like BEETL 8 , Orcom 10 , AssembleTrie 22 , and HARC 13 as the previous works demonstrated that they perform worse than the picked tools. The older utilities are not competitive in terms of compression ratio, as was demonstrated in the recent papers 12,14 (see also Table 1 for experiments with one of our datasets).…”
Section: Results Tools and Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the evaluation we used the state-of-the-art competitors, i.e., FaStore 12 , Spring 14 , and Minicom 15 . We resigned from testing some other good compressors like BEETL 8 , Orcom 10 , AssembleTrie 22 , and HARC 13 as the previous works demonstrated that they perform worse than the picked tools. The older utilities are not competitive in terms of compression ratio, as was demonstrated in the recent papers 12,14 (see also Table 1 for experiments with one of our datasets).…”
Section: Results Tools and Datasetsmentioning
confidence: 99%
“…In 12 , it was shown how to group reads from a bit larger genome regions. Significantly better results were, however, obtained in three recent articles presenting HARC 13 , Spring 14 , and Minicom 15 . The attempts differ in details, but are based on similar ideas.…”
mentioning
confidence: 83%
“…This mode is suitable, for example, for long term storage of raw sequencing data. Examples of preprocessing technologies of which the data output can be represented using the decoder syntax defined in MPEG-G for this mode of operation include those presented in ORCOM [13], HARC [14], FaStore [7] and, in general, all (future) preprocessing technologies that cluster reads based on common patterns.…”
Section: Compression Modes For Raw Sequencing Datamentioning
confidence: 99%
“…As a result, computational methods that reduce memory usage, e.g., by representing genomic data more compactly, or making inference on the fly by processing genomic data in an on-line manner are of high demand. Among these approaches, lossless compression methods on raw, mapped or indexed data [2,3,4,5,6,7] have been highly successful; for example, recent standardization efforts by MPEG-G [8,9] or GA4GH [10]), especially in the context of raw (FASTQ) and mapped (SAM/BAM) read collections based on current generation compression methods, have demonstrated that it is possible to reduce the size of genomic data by an order of magnitude. Similarly, on-line genomic data processing methods (e.g.…”
Section: Introductionmentioning
confidence: 99%