2020
DOI: 10.1038/s41598-020-57452-6
|View full text |Cite|
|
Sign up to set email alerts
|

FQSqueezer: k-mer-based compression of sequencing data

Abstract: the amount of data produced by modern sequencing instruments that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives. We present FQSqueezer, a novel compression algorithm for sequencing data able to process single-and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
9
1

Relationship

1
9

Authors

Journals

citations
Cited by 23 publications
(15 citation statements)
references
References 29 publications
0
14
0
1
Order By: Relevance
“…Since sequence headers contribute marginally to the sizes of FASTA/FASTQ les, they are compressed with well-established token-based method analogously as in FQSqueezer [23] or ENANO.…”
Section: Colord Overviewmentioning
confidence: 99%
“…Since sequence headers contribute marginally to the sizes of FASTA/FASTQ les, they are compressed with well-established token-based method analogously as in FQSqueezer [23] or ENANO.…”
Section: Colord Overviewmentioning
confidence: 99%
“…Because compressors designed for FASTQ data can be trivially adopted for FASTA-formatted inputs, we also included a comprehensive array of compressors designed primarily or specifically for FASTQ data: BEETL [ 28 ], Quip [ 29 ], fastqz [ 10 ], fqzcomp [ 10 ], DSRC 2 [ 30 ], Leon [ 31 ], LFQC [ 32 ], KIC [ 33 ], ALAPY [ 34 ], GTX.Zip [ 35 ], HARC [ 36 ], LFastqC [ 37 ], SPRING [ 38 ], Minicom [ 39 ], and FQSqueezer [ 40 ]. We also included AC—a compressor designed exclusively for protein sequences [ 41 ].…”
Section: Resultsmentioning
confidence: 99%
“…Minimizer are used to face the two challenges of processing k-mers: the high volume of data due to redundancy and the impossibility or difficulty of partitioning treatment [7]. Data structure to reduce redundancy: Minimizers are used to define data structures where not all the k-mers of a read are stored, but those that are contiguous and have the same minimizer are merged [14]. The product of this fusion is subsequences called super k-mers [8].…”
Section: What Do the Minimizers Contribute To The Processing Of K-mers?mentioning
confidence: 99%