Tree Compression with Top Trees

Bille, Philip; Gørtz, Inge Li; Landau, Gad M.; Weimann, Oren

doi:10.1007/978-3-642-39206-1_14

Cited by 22 publications

(75 citation statements)

References 25 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A top-tree compression [4] of a tree T is a DAG compression of T 's top-tree T . Bille et al [4] showed how to construct a data structure whose size is linear in the size of the DAG of T and supports navigational queries on T in time linear in the depth of T .…”

Section: Our Results and Techniquesmentioning

confidence: 99%

Compressed Range Minimum Queries

Gawrychowski¹,

Jo²,

Mozes³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Given a string S of n integers in [0, σ), a range minimum query RMQ(i, j) asks for the index of the smallest integer in S[i . . . j]. It is well known that the problem can be solved with a succinct data structure of size 2n + o(n) and constant query-time. In this paper we show how to preprocess S into a compressed representation that allows fast range minimum queries. This allows for sublinear size data structures with logarithmic query time. The most natural approach is to use string compression and construct a data structure for answering range minimum queries directly on the compressed string. We investigate this approach in the context of grammar compression. We then consider an alternative approach. Instead of compressing S using string compression, we compress the Cartesian tree of S using tree compression. We show that this approach can be exponentially better than the former, is never worse by more than an O(σ) factor (i.e. for constant alphabets it is never asymptotically worse), and can in fact be worse by an Ω(σ) factor.

show abstract

Section: Our Results and Techniquesmentioning

confidence: 99%

Compressed Range Minimum Queries

Gawrychowski¹,

Jo²,

Mozes³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The random access problem is to preprocess a data set into a compressed representation that supports fast retrieval of any part of the data without decompressing the entire data set. The random access problem is a well-studied problem for many types of data and compression schemes [1,3,5,8,9,19,31,35,41,48,53] and random access queries is a basic primitive in several algorithms and data structures on compressed data, see e.g., [7,9,23,24,25] In this paper, we consider the random access problem on collections of strings where each string is the result of an edit operation, i.e., inserting, delete, or replace a single character, from another string in the collection. Specifically, our collection is given by a rooted tree, called a version tree, where edges are labeled by an edit operation and a node represents the string obtained by applying the sequence of edit operation on the path from the root to the node (see Figure 1(a)).…”

Section: Introductionmentioning

confidence: 99%

“…< l a t e x i t s h a 1 _ b a s e 6 4 = " J C j c M U a u x U w S I g V 6 q d K q [4,4], (v 1 , v 5 ) in [8,12], and (v 5 , v 6 ) in [9,11]. (c) The segment selection instance corresponding to (a).…”

mentioning

confidence: 99%

See 1 more Smart Citation

Random Access in Persistent Strings and Segment Selection

Bille¹,

Gørtz²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a character) and a node represents the string obtained by applying the sequence of edit operations on the path from the root to the node. The goal is to compactly represent the entire collection while supporting fast random access to any part of a string in the collection. This problem captures natural scenarios such as representing the past history of a edited document or representing highly-repetitive collections. Given a tree with n nodes, we show how to represent the corresponding collection in O(n) space and optimal O(log n/ log log n) query time. This improves the previous time-space trade-offs for the problem. To obtain our results, we introduce new techniques and ideas, including a reduction to a new geometric line segment selection together with an efficient solution.

show abstract

“…A tree is one of the most important and popular structures in computing, used to represent the relations between nodes. Therefore, there has been considerable research on succinct representation of trees while allowing various operations on these trees to be efficiently performed, see (Chen and Reif, 1996), (Bille et al, 2013), (Jacobson, 1989) and (Benoit et al, 1999). The eXtensible Markup Language, XML (XML, 2013), is one of the most popular data formats for the serialization of tree data structures and for the storage of relational data.…”

Section: Introductionmentioning

confidence: 99%

Annotated Trees and their Applications to XML Compression

Müldner¹,

Miziołek²,

Corbin³

2014

Proceedings of the 10th International Conference on Web Information Systems and Technologies

View full text Add to dashboard Cite

Permutation based XML-conscious compressors permute the input document to improve the compression ratio and support efficiency of operations, such as queries or updates. One such compressor, XSAQCT, uses the properties of the permuted document, called an annotated tree, to these operations. This paper provides the formal background for the definition of an of D. It also provides an algorithm for creating an annotated tree for the XML document and its reverse algorithm, and discusses a measure of compressibility using an annotated tree. The theoretical and algorithm approaches are followed by the experimental results showing compressibility of annotated trees and a general analysis of semi-structured data and XML compression.

show abstract

Tree Compression with Top Trees

Cited by 22 publications

References 25 publications

Compressed Range Minimum Queries

Compressed Range Minimum Queries

Random Access in Persistent Strings and Segment Selection

Annotated Trees and their Applications to XML Compression

Contact Info

Product

Resources

About