Since 1977, when Lempel and Ziv described a kind of string factorization useful for text compression, there has been a succession of algorithms proposed for computing "LZ factorization". In particular, there have been several recent algorithms proposed that extend the usefulness of LZ factorization, for example, to the calculation of maximal repetitions. In this article, we provide an overview of these new algorithms and compare their efficiency in terms of their usage of time and space.
Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1-1 and its simple extension PSY1-2 are fast on strings that occur in biological, natural language and other applications (not highly periodic strings), while PSY1-3 guarantees (n) worst-case execution time. The final variant, PSY1-4, also achieves (n) processing time and, over the complete range of strings tested, is the fastest of the four. The space requirement of all four algorithms is about 5n bytes, but all make use of the "longest common prefix" (LCP) array, whose construction requires about 6n bytes. The four algorithms are faster in applications and use less space than a recently-proposed algorithm ) that produces equivalent output. The suffix array is not explicitly used by algorithms PSY1, but may be required for postprocessing; in this case, storage requirements rise to 9n bytes. We also describe two variants of a fast (n)-time algorithm PSY2 for computing all complete supernonextendible repeats in x.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.