2013
DOI: 10.1093/bioinformatics/btt594
|View full text |Cite
|
Sign up to set email alerts
|

MFCompress: a compression tool for FASTA and multi-FASTA data

Abstract: Motivation: The data deluge phenomenon is becoming a serious problem in most genomic centers. To alleviate it, general purpose tools, such as gzip, are used to compress the data. However, although pervasive and easy to use, these tools fall short when the intention is to reduce as much as possible the data, for example, for medium- and long-term storage. A number of algorithms have been proposed for the compression of genomics data, but unfortunately only a few of them have been made available as usable and re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
52
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 80 publications
(53 citation statements)
references
References 12 publications
(16 reference statements)
0
52
0
1
Order By: Relevance
“…In the case of FASTA and Multi-FASTA data, it is clear that special purpose methods, such as DELIMINATE [87] and MFCompress [90] provide an effective increase in compression ratio, although, as expected, at the cost of some additional computational resources. Also, although both DELIMINATE and MFCompress seem to provide equivalent performance-usually, DELIMINATE is faster and uses less memory, but does not compress as good as MFCompress-the choice of MFCompress to integrate a recent pipeline (MetaCRAM [93]) suggests that MFCompress is the current best choice for FASTA and Multi-FASTA data compression.…”
Section: Discussionmentioning
confidence: 72%
See 2 more Smart Citations
“…In the case of FASTA and Multi-FASTA data, it is clear that special purpose methods, such as DELIMINATE [87] and MFCompress [90] provide an effective increase in compression ratio, although, as expected, at the cost of some additional computational resources. Also, although both DELIMINATE and MFCompress seem to provide equivalent performance-usually, DELIMINATE is faster and uses less memory, but does not compress as good as MFCompress-the choice of MFCompress to integrate a recent pipeline (MetaCRAM [93]) suggests that MFCompress is the current best choice for FASTA and Multi-FASTA data compression.…”
Section: Discussionmentioning
confidence: 72%
“…The MFCompress method [90] exploits finite-context models, which are probabilistic models that select the probability distribution by estimating the probability of the next symbol in the sequence based on the k previous symbols (order-k context). MFCompress encodes the sequence headers using single finite-context models and encodes the DNA sequences using multiple competing finite-context models [41], along with arithmetic coding.…”
Section: Fasta/multi-fastamentioning
confidence: 99%
See 1 more Smart Citation
“…This highlights the disconnect between data production and data storage and the resource to process these data. However, compression techniques [19]- [21] could have the potential to help with the storage and retrieval of these huge data files.…”
Section: Hts Platform Challengesmentioning
confidence: 99%
“…Our study here does not aim to develop the compression tool for genome sequence. Pinho and Pratas (2014) present MFCompress that is used for compressing FASTA. The core algorithm of MFCompress is based on the context modeling technology.…”
Section: Introductionmentioning
confidence: 99%