2018
DOI: 10.1101/501130
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences

Abstract: DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. NAF compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz,… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 8 publications
(7 reference statements)
0
6
0
Order By: Relevance
“…We tested all DNA sequence compressors that are available and functional in 2020: dnaX [ 14 ], XM [ 15 ], DELIMINATE [ 16 ], Pufferfish [ 17 ], DNA-COMPACT [ 18 ], MFCompress [ 19 ], UHT [ 20 ], GeCo [ 21 ], GeCo2 [ 22 ], JARVIS [ 23 ], NAF [ 24 ], and NUHT [ 25 ]. We also included the relatively compact among homology search database formats: BLAST [ 26 ] and 2bit—a database format of BLAT [ 27 ].…”
Section: Resultsmentioning
confidence: 99%
“…We tested all DNA sequence compressors that are available and functional in 2020: dnaX [ 14 ], XM [ 15 ], DELIMINATE [ 16 ], Pufferfish [ 17 ], DNA-COMPACT [ 18 ], MFCompress [ 19 ], UHT [ 20 ], GeCo [ 21 ], GeCo2 [ 22 ], JARVIS [ 23 ], NAF [ 24 ], and NUHT [ 25 ]. We also included the relatively compact among homology search database formats: BLAST [ 26 ] and 2bit—a database format of BLAT [ 27 ].…”
Section: Resultsmentioning
confidence: 99%
“…NAF is a tool and a format for storing compressed DNA sequences. 8 The tool supports multiple formats for compression. The algorithm works in lossless reference-free mode.…”
Section: Literature Reviewmentioning
confidence: 99%
“…CoMSA is another compression algorithm for FASTA files introduced by Deorowicz et al (2018) based on a generalized Burrows-Wheeler transform. Similarly to MFCompress, The Nucleotide Archival Format (NAF) introduced by Kryukov et al (2019) is another compressor that works on amino acid sequences converted to their corresponding DNA bases by dictionary encoding this transformed string.…”
Section: Introductionmentioning
confidence: 99%