In this paper, we propose a new dynamic compressed index of O(w) space for a dynamic text T , where w = O(min(z log N log * M, N )) is the size of the signature encoding of T , z is the size of the Lempel-Ziv77 (LZ77) factorization of T , N is the length of T , and M ≥ 4N is an integer that can be handled in constant time under word RAM model. Our index supports searching for a pattern P in T in O(|P |fA + log w log |P | log * M (log N + log |P | log * M ) + occ log N ) time and insertion/deletion of a substring of length y in O((y + log N log * M ) log w log N log * M ) time, where fA = O(min{ log log M log log w log log log M , log w log log w }). Also, we propose a new space-efficient LZ77 factorization algorithm for a given text of length N , which runs in O(N fA + z log w log 3 N (log * N ) 2 ) time with O(w) working space.
Lossless data compression has been widely studied in computer science. One of the most widely used lossless data compressions is Lempel-Zip (LZ) 77 parsing, which achieves a high compression ratio. Bidirectional (a.k.a. macro) parsing is a lossless data compression and computes a sequence of phrases copied from another substring (target phrase) on either the left or the right position in an input string. Gagie et al. (LATIN 2018) recently showed that a large gap exists between the number of smallest bidirectional phrases of a given string and that of LZ77 phrases. In addition, finding the smallest bidirectional parse of a given text is NP-complete. Several variants of bidirectional parsing have been proposed thus far, but no prior work for bidirectional parsing has achieved high compression that is smaller than that of LZ77 phrasing for any string. In this paper, we present the first practical bidirectional parsing named LZ77 parsing with right reference (LZRR), in which the number of LZRR phrases is theoretically guaranteed to be smaller than the number of LZ77 phrases. Experimental results using benchmark strings show the number of LZRR phrases is approximately five percent smaller than that of LZ77 phrases.
We address a variant of the dictionary matching problem where the dictionary is represented by a straight line program (SLP). For a given SLP-compressed dictionary D of size n and height h representing m patterns of total length N, we present an O (n 2 log N)-size representation of Aho-Corasick automaton which recognizes all occurrences of the patterns in D in amortized O (h + m) running time per character. We also propose an algorithm to construct this compressed Aho-Corasick automaton in O (n 3 log n log N) time and O (n 2 log N) space. In a spacial case where D represents only a single pattern, we present an O (n log N)-size representation of the Morris-Pratt automaton which permits us to find all occurrences of the pattern in amortized O (h) running time per character, and we show how to construct this representation in O (n 3 log n log N) time with O (n 2 log N) working space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.