2015
DOI: 10.1109/tcbb.2015.2403370
|View full text |Cite
|
Sign up to set email alerts
|

Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities

Abstract: Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 39 publications
(63 reference statements)
0
10
0
Order By: Relevance
“…GDC2 [10] applies a two-level Ziv Lempel factorization [27] to compress large set of genome sequences. MSC [16] utilizes both intra-sequence and inter-sequence similarities for compression via searching subsequence matches in reference sequence and other parts of the target sequence itself, the compression order is determined by a recursive full search algorithm.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…GDC2 [10] applies a two-level Ziv Lempel factorization [27] to compress large set of genome sequences. MSC [16] utilizes both intra-sequence and inter-sequence similarities for compression via searching subsequence matches in reference sequence and other parts of the target sequence itself, the compression order is determined by a recursive full search algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…Reference-based genome compression for compressing a single genome sequence has been intensively studied and achieved much higher compression ratio than reference free compression [8]. Existing reference-based genome compression algorithms include GDC [9], GDC2 [10], iDoComp [11], ERGC [12], HiRGC [13], CoGI [14], RlZAP [15], MSC [16], RCC [17], NRGC [18], SCCG [19] and FRESCO [20]. A straightforward application of these reference-based compression algorithms to solve the challenging problem of compressing a database containing n number of genome sequences is to conduct a one-by-one sequential reference-based compression for every genome in the database using one fixed reference genome.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Seed-based methods are used frequently in DNA compression methods. Examples include DNACompress [30], MSC [56] and RCC [60].…”
Section: Signal Processing Techniques For Identifying Sequence Similamentioning
confidence: 99%
“…In other words, similar subsequences are encoded once only which act as references to their occurrence at other locations of the same sequence or other sequences. Examples of methods in this group include DNAzip [12], RLCSA [46], RLZ [47,48], GRS [49], GReEn [50], iDoComp [51], COMRAD [52,53], ERGC [54], CoGI [55], MSC [56], GDC [57,58], FRESCO [59] and RCC [60]. With the use of appropriate reference sequences, the storage size reduction in some cases can be over 90%.…”
Section: Introductionmentioning
confidence: 99%