We present a new graph compressor that works by recursively detecting repeated substructures and representing them through grammar rules. We show that for a large number of graphs the compressor obtains smaller representations than other approaches. Specific queries such as reachability between two nodes or regular path queries can be evaluated in linear time (or quadratic times, respectively), over the grammar, thus allowing speed-ups proportional to the compression ratio.Locality: most links lead to pages within the same host (i.e., the URLs have the same prefix) andSimilarity: pages on the same host often share the same links.Due to these properties, ordering the nodes lexicographically by their URL provides an order in which similar nodes are close to each other. The WebGraph framework [4] by Boldi and Vigna is originally based on this order, but was later improved with a different order [3]. It represents the adjacency list of a graph using several layers of encodings, while retaining the ability to answer out-neighborhood queries. An out-neighborhood (in-neighborhood) query applied to a node u retrieves all nodes v such that there is an edge from u to v (from v to u). As not every graph is a web graph, a lexicographical order 4 of the node-names is not always possible or useful. Apostolico and Drovandi therefore propose to use a BFS-order [2] combined with another encoding. A different approach is proposed by Grabowski and Bieniecki [24], where contiguous blocks of the adjacency list are merged into a single ordered list, and a list of flags which are used to recover the original lists. They then encode the ordered list and use the deflate-compressor to compress both lists. To our knowledge, their method is the current state-of-the-art in compression/query trade-off, when only out-neighborhood queries are considered. The methods above have in common that they encode the adjacency list of a graph and natively only support out-neighborhood queries. The k 2 -trees of Brisaboa et al.[5] on the other hand compress the adjacency matrix of the graph. They do this by recursively partitioning it into k 2 many squares. If one of these includes only 0-values, then it is represented by a 0-leaf in the tree, and otherwise the square is partitioned further. This Quadtree-like representation is well known (see, e.g., [55]), but their succinct binary encoding is a clever new approach. The method provides access to both, in-and outneighborhood queries, and can be applied to any binary relation. We use k 2 -trees to represent the start graph of our grammars. The k 2 -tree-method was combined by Hernández and Navarro [27] with dense substructure removal, originally proposed by Buehrer and Chellapilla [6]. A dense substructure is defined by two sets of nodes U, S such that they induce a complete bipartite graph. Note that U and S need not be disjoint. The edges in these bicliques are replaced by a single "virtual node". To our knowledge, the method of [27] is the current state-of-the-art in compression/query trade-off, when in...
We present a new graph compressor that detects repeating substructures and represents them by grammar rules. We show that for a large number of graphs the compressor obtains smaller representations than other approaches. For RDF graphs and version graphs it outperforms the best known previous methods. Specific queries such as reachability between two nodes, can be evaluated in linear time over the grammar, thus allowing speed-ups proportional to the compression ratio.
Abstract. Straight-line (linear) context-free tree (SLT) grammars have been used to compactly represent ordered trees. It is well known that equivalence of SLT grammars is decidable in polynomial time. Here we extend this result and show that isomorphism of unordered trees given as SLT grammars is decidable in polynomial time. The proof constructs a compressed version of the canonical form of the tree represented by the input SLT grammar. The result is generalized to unrooted trees by "re-rooting" the compressed trees in polynomial time. We further show that bisimulation equivalence of unrooted unordered trees represented by SLT grammars is decidable in polynomial time. For non-linear SLT grammars which can have double-exponential compression ratios, we prove that unordered isomorphism is PSPACE-hard and in EXPTIME. The same complexity bounds are shown for bisimulation equivalence.
We present a pointer-based data structure for constant time traversal of the edges of an edge-labeled (alphabet Σ) directed hypergraph (a graph where edges can be incident to more than two vertices, and the incident vertices are ordered) given as hyperedge-replacement grammar G. It is assumed that the grammar has a fixed rank κ (maximal number of vertices connected to a nonterminal hyperedge) and that each vertex of the represented graph is incident to at most one σ-edge per direction (σ ∈ Σ). Precomputing the data structure needs O(|G||Σ|κrh) space and O(|G||Σ|κrh 2 ) time, where h is the height of the derivation tree of G and r is the maximal rank of a terminal edge occurring in the grammar.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.