a b s t r a c tThe Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we study the Longest Common Substructure of two matrices and show that this problem is N P -hard. We also study the Longest Common Subforest problem for multiple trees including a constrained version, as well. We show N P -hardness for k > 2 unordered trees in the constrained LCS. We also give polynomial time algorithms for ordered trees and prove a lower bound for any decomposition strategy for k trees.
The Longest Common Subsequence (LCS) of two strings A, B is a well studied problem having a wide range of applications. When each symbol of the input strings is assigned a positive weight the problem becomes the Heaviest Common Subsequence (HCS) problem. In this paper we consider a different version of weighted LCS on Position Weight Matrices (PWM). The Position Weight Matrix was introduced as a tool to handle a set of sequences that are not identical, yet, have many local similarities. Such a weighted sequence is a 'statistical image' of this set where we are given the probability of every symbol's occurrence at every text location. We consider two possible definitions of LCS on PWM.For the first, we solve the LCS problem of z sequences in time O (zn z+1 ). For the second, we consider the log-probability version of the problem, prove N P-hardness and provide an approximation algorithm.
In this paper we define a new similarity measure: LCSk, aiming at finding the maximal number of k length substrings matching in both input strings while preserving their order of appearance, for which the traditional LCS is a special case, where k = 1. We examine this generalization in both theory and practice. We first describe its basic solution and give an experimental evidence in real data for its ability to differentiate between sequences that are considered similar according to the LCS measure. We then examine extensions of the LCSk definition to LCS in at least k-length substrings (LCS ≥ k) and 2-dimensional LCSk and also define complementary EDk and ED ≥ k distances.
In this paper we define a new problem, motivated by computational biology, LCSk aiming at finding the maximal number of k length substrings, matching in both input strings while preserving their order of appearance. The traditional LCS definition is a special case of our problem, where k = 1. We provide an algorithm, solving the general case in O(n 2 ) time, where n is the length of the input strings, equaling the time required for the special case of k = 1. The space requirement of the algorithm is O(kn). We also define a complementary EDk distance measure and show that EDk(A, B) can be computed in O(nm) time and O(km) space, where m, n are the lengths of the input sequences A and B respectively.
Abstract. The dictionary matching with gaps problem is to preprocess a dictionary D of d gapped patterns P1, . . . , P d over alphabet Σ, where each gapped pattern Pi is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text T of length n over alphabet Σ, the goal is to output all locations in T in which a pat-There is a renewed current interest in the gapped matching problem stemming from cyber security. In this paper we solve the problem where all patterns in the dictionary have one gap with at least α and at most β don't cares, where α and β are given parameters. Specifically, we show that the dictionary matching with a single gap problem can be solved in either O(d log d + |D|) time and O(d log ε d + |D|) space, and query time O(n(β − α) log log d log 2 min{d, log |D|} + occ), where occ is the number of patterns found, or preprocessing time and space: O(d 2 + |D|), and query time O(n(β − α) + occ), where occ is the number of patterns found. As far as we know, this is the best solution for this setting of the problem, where many overlaps may exist in the dictionary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.