2014
DOI: 10.2174/1574893609666140516010143
|View full text |Cite
|
Sign up to set email alerts
|

Trends in Genome Compression

Abstract: Technological advancements in high-throughput sequencing have lead to a tremendous increase in the amount of genomic data produced. With the cost being down to 2,000 USD for a single human genome, sequencing dozens of individuals is a task that is feasible even for smaller project or organizations already today. However, generating the sequence is only one issue; another one is storing, managing, and analyzing it. These tasks become more and more challenging due to the sheer size of the data sets and are incre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 44 publications
(33 citation statements)
references
References 60 publications
(42 reference statements)
1
31
0
Order By: Relevance
“…For this purpose, the target is aligned to the reference and the mismatches between these sequences are encoded. Since the decompressor has access to the reference sequence(s), the reference-based methods can obtain very high compression rates [26,28,53,54].…”
Section: Reference-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For this purpose, the target is aligned to the reference and the mismatches between these sequences are encoded. Since the decompressor has access to the reference sequence(s), the reference-based methods can obtain very high compression rates [26,28,53,54].…”
Section: Reference-based Methodsmentioning
confidence: 99%
“…Many studies have been carried out on the topic of genomic sequence compression, taking into account the characteristics of these sequences, such as small alphabet size (i.e., four, namely the nucleotides A (adenine), T (thymine), C (cytosine) and G (guanine)), repeats and palindromes [25][26][27][28]. In this section, two categories of reference-free methods, that are based only on the characteristics of the target sequences, and reference-based methods, that exploit a (set of) reference sequence(s), are considered for describing these studies.…”
Section: Genomic Sequence Compressionmentioning
confidence: 99%
“…Compression with nbit uses a combination of 2-bit encoding for gapless bases (i.e., A=00, C=01, G=10, T=11), and 3-bit encoding for gappy regions. Although this encoding was inspired by prior work (Wandelt et al, 2014), DECIPHER uses a unique implementation that is customized to the package's goals.…”
Section: The Nbit Compression Format For Nucleotidesmentioning
confidence: 99%
“…Sequencing throughputs have rapidly outpaced even Moore's Law for computing power, which states that the number of transistors in a dense integrated circuit doubles approximately every two years (195). Improvement of data compression algorithms would help to address the growing data-to-central processing unit (CPU) ratio (287). To date, thousands of WG and hundreds of thousands of exomes have been sequenced.…”
Section: Predicted Paradigm Shift In Disease Diagnosismentioning
confidence: 99%