2013
DOI: 10.1007/978-3-642-41062-8_26
|View full text |Cite
|
Sign up to set email alerts
|

Longest Common Subsequence in k Length Substrings

Abstract: In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where k = 1. We provide an algorithm, solving the general case in O(n 2 ) time, where n is the length of the input strings, equaling the time required for the special case of k = 1. The space requirement of the algorithm is O(kn). We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 18 publications
0
18
0
Order By: Relevance
“…For this reason, a subset of anchors that satisfy the monotonicity condition needs to be selected. The problem of identifying such a subset can be expressed as finding the Longest Common Subsequence in k Length Substrings 27 (LCSk). Note that this is distinct from just finding the longest common subsequence as that ignores the information determined in the anchors and can favour alignments that have many more indels.…”
Section: Methodsmentioning
confidence: 99%
“…For this reason, a subset of anchors that satisfy the monotonicity condition needs to be selected. The problem of identifying such a subset can be expressed as finding the Longest Common Subsequence in k Length Substrings 27 (LCSk). Note that this is distinct from just finding the longest common subsequence as that ignores the information determined in the anchors and can favour alignments that have many more indels.…”
Section: Methodsmentioning
confidence: 99%
“…It is clearly evident from Figure. 2 that the proposed algorithm requires the least possible computational time to compute LCSS of any two data sets, with a constant length and varying dimensionality, against field proven schemes i.e., sequential approach [9] and dynamic programming based algorithms [8], [27], [37]. Additionally, if similarity indexes of any two data sets, both real and benchmark, is high then performance of the proposed scheme is exceptionally well as shown in Figure. 3 such as if S i and T j are completely similar then the proposed approach computes their LCSS in O(m) time, where m represents the data set with maximum length.…”
Section: Resultsmentioning
confidence: 91%
“…The current implementation uses a hardcoded seed that is 12 bases long with an indel/mismatch allowed in the middle (6 matching bases, 1 indel/mismatch base, followed by 6 matching bases). GraphMap then collects seed hits, using them for finding the longest common subsequence in k-length substrings ( Benson et al , 2013 ). The output from this step is then filtered to find collinear chains of seeds (private correspondence with Ivan Sović).…”
Section: Long Read Overlap Methodologiesmentioning
confidence: 99%