2016
DOI: 10.1186/s12864-016-2793-0
|View full text |Cite
|
Sign up to set email alerts
|

A new algorithm for “the LCS problem” with application in compressing genome resequencing data

Abstract: BackgroundThe longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data.MethodsFirst, we present a new algorithm for the LCS problem. Using the generalized suffix tree, we identify the common substrings shared between the two input sequences. Using the maximal common substrings, we construct a directed acyclic graph (DAG), based on which we determine the LCS as the long… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(12 citation statements)
references
References 32 publications
(58 reference statements)
0
12
0
Order By: Relevance
“…In this context, the tools MLF [61] and SLF [62], use suffix arrays to create two data lists to search for shared segments. One list is the Longest Previous Factor (LPF), and the other is the Position (POS).…”
Section: Reference Sequence Indexingmentioning
confidence: 99%
See 2 more Smart Citations
“…In this context, the tools MLF [61] and SLF [62], use suffix arrays to create two data lists to search for shared segments. One list is the Longest Previous Factor (LPF), and the other is the Position (POS).…”
Section: Reference Sequence Indexingmentioning
confidence: 99%
“…In vertical compression, instead of using, like a dictionary, already visited segments of the input stream, the search for repetitive segments is restricted to other sequences available as references [52]. One particular case is tools RCC, GeCo [47], SLF [62] and DNAComp [73] that include the input stream (the target sequence) in the search space. In this way, besides using only the reference sequence as search space, it uses information from the target sequence being compressed.…”
Section: First Order Mappingmentioning
confidence: 99%
See 1 more Smart Citation
“…The LCS of two strings is a subsequence that appears in both strings of maximal length [11]. The LCS has applications in many areas of computing, such as data compression [12], speech and signal processing [13], pattern recognition [14], spell checking [15], bioinformatics and computational biology [16], file comparison [17], computational linguistic analysis [18], combinatorial optimization [19] and text sentiment classification [20,21]. Different variants of LCS algorithm have been introduced in [22].…”
Section: Introductionmentioning
confidence: 99%
“…sequence comparison and genome compression [14]) and has thus been extensively studied in computer science [1517]. Finding a LCS for multiple input sequences has been proved NP-hard [18], while its pairwise counterpart is polynomial and often used in comparative genomics [1921]. But as far as we know, the LCS problem has never been defined on bucket orders.…”
Section: Introductionmentioning
confidence: 99%