Philip Bille scite author profile

We consider labeling schemes for trees, supporting various relationships between nodes at small distance. For instance, we show that given a tree T and an integer k we can assign labels to each node of T such that given the label of two nodes we can decide, from these two labels alone, if the distance between v and w is at most k and, if so, compute it. For trees with n nodes and k ≥ 2, we give a lower bound on the maximum label length of log n + Ω(log log n) bits, and for constant k, we give an upper bound of log n+O(log log n). Bounds for ancestor, sibling, connectivity, and bi-and triconnectivity labeling schemes are also presented.

show abstract

Random Access to Grammar-Compressed Strings and Trees

Bille¹,

Landau²,

Raman³

et al. 2015

SIAM J. Comput.

115

View full text Add to dashboard Cite

Abstract. Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures (sometimes with slight reduction in efficiency) many of the popular compression schemes, including the Lempel-Ziv family, Run-Length Encoding, Byte-Pair Encoding, Sequitur, and Re-Pair. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string.Let S be a string of length N compressed into a context-free grammar S of size n. We present two representations of S achieving O(log N ) random access time, and either O(n·α k (n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, α k (n) is the inverse of the k th row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P |k, k 4 + |P |} + log N ) + occ), where occ is the number of occurrences of P in S. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees.All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.Key words. grammar-based compression, straight-line program, approximate string matching, tree compression AMS subject classifications. 68P05, 68P301. Introduction. Modern textual or semi-structured databases, e.g. for biological and WWW data, are huge, and are typically stored in compressed form. A query to such databases will typically retrieve only a small portion of the data. This presents several challenges: how to query the compressed data directly and efficiently, without the need for additional data structures (which can be many times larger than the compressed data), and how to retrieve the answers to the queries. In many practical cases, the naive approach of first decompressing the entire data and then processing it is completely unacceptable -for instance XML data compresses by an order of magnitude on disk [25] but expands by an order of magnitude when represented in-memory [22]; as we will shortly see, this approach is very problematic from an

show abstract

Random Access to Grammar-Compressed Strings

Bille¹,

Landau²,

Raman³

et al. 2011

View full text Add to dashboard Cite

Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N ) random access time, and either O(n · α k (n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, α k (n) is the inverse of the k th row of Ackermann's function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P |k, k 4 + |P |} + log N ) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammar-compressed trees.All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars.

show abstract

Tree Compression with Top Trees

Bille

Gørtz

Landau

et al. 2013

View full text Add to dashboard Cite

We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation. We show that the new compression scheme achieves close to optimal worst-case compression, can compress exponentially better than DAG compression, is never much worse than DAG compression, and supports navigational queries in logarithmic time.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Philip Bille

A survey on tree edit distance and related problems

Labeling Schemes for Small Distances in Trees

Random Access to Grammar-Compressed Strings and Trees

Random Access to Grammar-Compressed Strings

Tree Compression with Top Trees

Contact Info

Product

Resources

About