2017
DOI: 10.1089/cmb.2016.0151
|View full text |Cite
|
Sign up to set email alerts
|

Toward a Better Compression for DNA Sequences Using Huffman Encoding

Abstract: Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
19
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(19 citation statements)
references
References 12 publications
0
19
0
Order By: Relevance
“…In 1993 the first specialized DNA compressor was proposed (Grumbach and Tahi, 1993). Since then, numerous DNA compressors were developed (e.g., Cao et al, 2007, Li et al, 2013, Benoit et al, 2015, Al-Okaily et al, 2017. In our experience only two compressors pass the practicality threshold: DELIMINATE (Mohammed et al, 2012) and MFCompress (Pinho and Pratas, 2014).…”
Section: Introductionmentioning
confidence: 88%
“…In 1993 the first specialized DNA compressor was proposed (Grumbach and Tahi, 1993). Since then, numerous DNA compressors were developed (e.g., Cao et al, 2007, Li et al, 2013, Benoit et al, 2015, Al-Okaily et al, 2017. In our experience only two compressors pass the practicality threshold: DELIMINATE (Mohammed et al, 2012) and MFCompress (Pinho and Pratas, 2014).…”
Section: Introductionmentioning
confidence: 88%
“…UHT [34] DNA compression based on using Huffman coding. Unbalanced Huffman encoding/Tree, forcing the Huffman tree to be unbalanced to be better than the standard Huffman.…”
Section: Solution Contentsmentioning
confidence: 99%
“…UHTL [34] DNA compression based on using Huffman coding. Developed version of UHT that prioritizes encoding the k-mers that contain the least frequent base.…”
Section: Solution Contentsmentioning
confidence: 99%
“…It is required to reduce the storage size and the processing costs, as well as aid in fast searching retrieval information, and increase the transmission speed over the internet with limited bandwidth [13,22]. In general, compression can be either lossless or lossy; in lossless compression, no information is lost and the original data can be recovered exactly, while in lossy compression, only an approximation of the original data can be recovered [23][24][25][26]. Many researchers believe that lossless compression schemes are particularly needed for biological and medical data, which cannot afford to lose any part of their data [3].…”
mentioning
confidence: 99%