Mikko Berggren Ettienne scite author profile

We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result, we combine several recent findings, including string attractors —new combinatorial objects encompassing most known compressibility measures for highly repetitive texts—and grammars based on locally consistent parsing . More in detail, letγ be the size of the smallest attractor for a text T of length n . The measureγ is an (asymptotic) lower bound to the size of dictionary compressors based on Lempel–Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O (γ log ( n /γ)), and our lightest indexes work within the same asymptotic space. Let ε > 0 be a suitably small constant fixed at construction time, m be the pattern length, and occ be the number of its text occurrences. Our index counts pattern occurrences in O ( m +log 2+ε n ) time and locates them in O ( m +( occ +1)log ε n ) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O (( m + occ ),polylog, n ) time. Further, by increasing the space to O (γ log ( n /γ)log ε n ), we reduce the locating time to the optimal O ( m + occ ), and within O (γ log ( n /γ)log n ) space we can also count in optimal O ( m ) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O ( n ) space and O ( n log n ) expected time. As a by-product of independent interest, we show how to build, in O ( n ) expected time and without knowing the sizeγ of the smallest attractor (which is NP-hard to find), a run-length context-free grammar of size O (γ log ( n /γ)) generating (only) T . As a result, our indexes can be built without knowingγ.

show abstract

Time–space trade-offs for Lempel–Ziv compressed indexing

Bille¹,

Ettienne²,

Gørtz³

et al. 2018

Theoretical Computer Science

View full text Add to dashboard Cite

Given a string S, the compressed indexing problem is to preprocess S into a compressed representation that supports fast substring queries. The goal is to use little space relative to the compressed size of S while supporting fast queries. We present a compressed index based on the Lempel-Ziv 1977 compression scheme. We obtain the following time-space trade-offs: For constant-sized alphabetswhere n and m are the length of the input string and query string respectively, z is the number of phrases in the LZ77 parse of the input string, occ is the number of occurrences of the query in the input and ǫ > 0 is an arbitrarily small constant. In particular, (i) improves the leading term in the query time of the previous best solution from O(m lg m) to O(m) at the cost of increasing the space by a factor lg lg z. Alternatively, (ii) matches the previous best space bound, but has a leading term in the query time of O(m(1 + lg ǫ z lg(n/z) )). However, for any polynomial compression ratio, i.e., z = O(n 1−δ ), for constant δ > 0, this becomes O(m). Our index also supports extraction of any substring of length ℓ in O(ℓ + lg(n/z)) time. Technically, our results are obtained by novel extensions and combinations of existing data structures of independent interest, including a new batched variant of weak prefix search.

show abstract

Compressed Indexing with Signature Grammars

Christiansen

Ettienne

2018

View full text Add to dashboard Cite

The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S. We present a data structure that supports pattern matching queries in O(m + occ(lg lg n + lg ǫ z)) time using O(z lg(n/z)) space where z is the size of the LZ77 parse of S and ǫ > 0 is an arbitrarily small constant, when the alphabet is small or z = O(n 1−δ ) for any constant δ > 0. We also present two data structures for the general case; one where the space is increased by O(z lg lg z), and one where the query time changes from worst-case to expected. These results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using O(z lg(n/z)) space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.

show abstract

Reimplementing a Multi-Agent System in Python

Villadsen

Jensen

Ettienne

et al. 2013

View full text Add to dashboard Cite

Implementing a Multi-Agent System in Python with an Auction-Based Agreement Approach

Ettienne

Vester

Villadsen

2012

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.