2011 IEEE Statistical Signal Processing Workshop (SSP) 2011
DOI: 10.1109/ssp.2011.5967637
|View full text |Cite
|
Sign up to set email alerts
|

Bacteria DNA sequence compression using a mixture of finite-context models

Abstract: The ability of finite-context models for compressing DNA sequences has been demonstrated on some recent works. In this paper, we further explore this line, proposing a compression method based on eight finite-context models, with orders from two to sixteen, whose probabilities are averaged using weights calculated through a recursive procedure. The method was tested on a total of 2,338 sequences belonging to bacterial genomes, with sizes ranging from 1,286 to 13,033,779 bases, showing better compression result… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(27 citation statements)
references
References 11 publications
0
27
0
Order By: Relevance
“…The GeCo algorithm [50], derived from [51,52], exploits a combination of context models of several orders for reference-free, as well as for reference-based genomic sequence compression. In this method, extended finite-context models (XFCMs), that are tolerant against substitution errors, are introduced.…”
Section: Reference-free Methodsmentioning
confidence: 99%
“…The GeCo algorithm [50], derived from [51,52], exploits a combination of context models of several orders for reference-free, as well as for reference-based genomic sequence compression. In this method, extended finite-context models (XFCMs), that are tolerant against substitution errors, are introduced.…”
Section: Reference-free Methodsmentioning
confidence: 99%
“…Working in DNA compression was initially presented by Grumbach and Tahi in their pioneer work of DNA sequences compression by BioCompress Algorithm (Pinho et al, 2011) and its second version BioCompress-2, these algorithms are based on Ziv-Lempel compression technique (Berger and Mortensen, 2010), BioCompress-2 search for exact repeats in already encoded sequences, then encodes that repeats by repeat length and the position of preceding repeat appeared, when no repetition is found it uses order-2 arithmetic coding (Lin et al, 2009).…”
Section: Related Workmentioning
confidence: 99%
“…Instead, they start with a uniformly distributed model, and update it continuously, for example, through the use of counters, to estimate the probabilistic models based on incoming symbols [15].…”
Section: Updating Probabilistic Modelsmentioning
confidence: 99%
“…Finally, FCM gained protagonism in [13][14][15][16], implemented up to order-16. As in the previous cases, the FCMs were followed by an arithmetic encoder, and several different-order FCMs compete.…”
Section: Introductionmentioning
confidence: 99%