This paper re-examines, in a unified framework, two classic approaches to the problem of finding a longest common subsequence (LCS) of two strings, and proposes faster implementations for both. Let I be the length of an LCS between two strings of length m and n -> m, respectively, and let s be the alphabet size. The first revised strategy follows the paradigm of a previous O(In) time algorithm by Hirschberg. The new version can be implemented in time O(lm. rain{log s, log m, log(2n/m)}), which is profitable when the input strings differ considerably in size (a looser bound for both versions is O(mn)). The second strategy improves on the HuntSzymanski algorithm, This latter takes time O((r+ n)log n), where r < -mn is the total number of matches between the two input strings. Such a performance is quite good (O(n log n)) when r-n, but it degrades to O(mn log n) in the worst case. On the other hand the variation presented here is never worse than linear-time in the product ran. The exact time bound derived for this second algorithm is O(m log n + d log(2mn/d)), where d <-r is the number of dominant matches (elsewhere referred to as minimal candidates) between the two strings. Both algorithms require an O(ri log s~ preprocessing that is nearly standard for the LCS problem, and they make use of simple and handy auxiliary data structures.
The string editing problem for input strings x and y consists of transforming x into y by performing a series of weighted edit operations on x of overall minimum cost. An edit operation on x can be the deletion of a symbol from x, the insertion of a symbol in x or the substitution of a symbol x with another symbol. This problem has a well known O(lxl lyl) time sequential solution [25]. We give the efficient P R A M parallel algorithms for the string editing problem. If m=(Ixl, lyl) and n=max(lxl, lyl), then our CREW bound is @log rn log n) time with O(rnn/ log rn) processors. In all algorithms, space is O(rnn).Key words and phrases: Strint-to-string correction, edit distances, spelling correction, longest common subsequence, shortest paths, grid graphs, analysis of algorithms, parallel computation, cascading divide-and-conquer AMs subject classification: 68Q2.S The string editing problem for input strings z and y consists of transforming z into y by performing a series of weighted edit operations on z of overall minimum cost. An edit operation on z can be the deletion of a symbol from z, the insertion of a symbol in z or the substitution of a symbol of z with another symbol. This problem has a well known O( Izllvl) time sequential solution [25]. We give efficient PRAM parallel algorithms for the string editing problem. If m = min(lz1, Iyl) and n = max(1z1, Iyl), then our CREW bound is O(1og rn log n) time with O(rnn/ log rn) processors. Our CRCW bound is O((log n(1og log rn)2) time with O(mn/ log logrn) processors. In all algorithms, space is O(mn).
Based on the Boyer-Moore-Galil approach. a new algorithm is proposed which requires a number of character comparisons bounded by 20, regardless of the Dumber of occurrences of the pattern in the textstring. Preprocessing is only slightly more involved and still requires a time linear in the pattern size.
Abstract:The Web Graph is a large-scale graph that does not fit in main memory, so that lossless compression methods have been proposed for it. This paper introduces a compression scheme that combines efficient storage with fast retrieval for the information in a node. The scheme exploits the properties of the Web Graph without assuming an ordering of the URLs, so that it may be applied to more general graphs. Tests on some datasets of use achieve space savings of about 10% over existing methods.
A string w covers another string z if every position of z is within some occurrence of w in z. Clearly, every string is covered by itself. A string that is covered only by itself is superprimitive. We show that the property of being superprimitive is testable on a string of n symbols in O(n) time and space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.