We focus on succinct data structures, that is on time and space efficient representations of trees and other combinatorial objects that dominate the memory requirements of most sophisticated programs and systems.
SUCCINCT DATA STRUCTURES, WHAT ARE THEY?This paper deals with space efficient data structures, and their potential for application in Symbolic Computation. In particular we focus on succinct data structures, which are best thought of as representations of combinatorial objects, such as trees, in a number of bits close to the information theoretic minimum, that support performing the "natural" operations (for trees, that would include finding parent, first child, next child, subtree size, etc.) quickly, ideally in time independent of the size of the structure. Such representations can have a dramatic effect on the space, and indeed time requirements, of common computational problems. Consider, for example, the representation of a binary tree on four billion nodes. The number of trees on n nodes is the n th Catalan number, Cn = 2n n = 4 n /(πn 3/2 ), so the information theoretic bound on the space to represent this is the lg (or logarithm base 2) of this number, i.e. 2n − O(lg n) bits. Succinct structures can get away withabout 2n + n lg lg n/lg n bits. If n is four billion, the lower bound is about 8 billion bits, or a gigabyte, and succinct structures use somewhat under 9 gigabits. Using a standard representation of lg n bits for a pointer, and requiring references to leftchild, rightchild, parent and leftmost descendent (useful especially in text indexing), as well as the size of the * subtree rooted at each node, we would need 5 n lg n bits. For n = 4 billion, this is 640 gigabits, 70 times that of the succinct representation. Space differences such as this can change the level of memory on which much of a structure is stored, and hence the time required.This paper is intended to give the reader a brief and informal overview and pointers to papers on a variety of aspects of the subject. It is definitely the personal perspective of the author. The forthcoming book, Compact Data Structures: A Practical Approach by Gonzalo Navarro [30], is an excellent source on the topic. He uses the title in as a somewhat more general term than "succinct data structures", to mean structures using dramatically less storage than conventional implementations. As his title suggests, it also focuses on techniques appropriate for use on today's machines and problem sizes, rather than asymptotics as such. The surveys noted [27,29] also provide readable summaries of the highlights. Most of the work, especially the work on which we will focus, has been on static objects, as the methods are simpler. There has, however, been substantial work of dynamic structures. We will comment briefly on this (mostly by giving a few references) in the appropriate sections.
THE BASICSThe term "succinct data structure" first appears as the title of Guy Jacobson's Ph.D. thesis [20]. Most of the key results of that work also appear in [19], which, ...