The manipulation of large sequence data is one of the most important problems in string processing. In this paper, we discuss a new data structure for storing and manipulating sets of strings, called Sequence Binary Decision Diagrams (sequence BDDs), which has recently been introduced by Loekito et al. (Knowl. Inf. Syst., 24(2), 235-268, 2009) as a descendant of both acyclic DFAs (ADFAs) and binary decision diagrams (BDDs). Sequence BDDs can compactly represent sets of sequences similarly to minimal ADFAs, and allow efficient set operations inherited from BDDs. We study fundamental properties of sequence BDDs, such as the characterization of minimal sequence BDDs by reduced sequence BDDs, non-trivial relationships between sizes of minimal sequence BDDs and minimal ADFAs, the complexities of minimization, Boolean set operations, and sequence BDD construction. We also show experimental results for real and artificial data sets.
In this article, we disprove the long-standing conjecture, proposed by R. E. Bryant in 1986, that his binary decision diagram (BDD) algorithm computes any binary operation on two Boolean functions in linear time in the input-output sizes. We present Boolean functions for which the time required by Bryant's algorithm is a quadratic of the input-output sizes for all nontrivial binary operations, such as ∧, ∨, and ⊕. For the operations ∧ and ∨, we show an even stronger counterexample where the output BDD size is constant, but the computation time is still a quadratic of the input BDD size. In addition, we present experimental results to support our theoretical observations.
Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data.
Abstract:In this article, we propose a succinct data structure of zero-suppressed binary decision diagrams (ZDDs). A ZDD represents sets of combinations efficiently and we can perform various set operations on the ZDD without explicitly extracting combinations. Thanks to these features, ZDDs have been applied to web information retrieval, information integration, and data mining. However, to support rich manipulation of sets of combinations and update ZDDs in the future, ZDDs need too much space, which means that there is still room to be compressed. The paper introduces a new succinct data structure, called DenseZDD, for further compressing a ZDD when we do not need to conduct set operations on the ZDD but want to examine whether a given set is included in the family represented by the ZDD, and count the number of elements in the family. We also propose a hybrid method, which combines DenseZDDs with ordinary ZDDs. By numerical experiments, we show that the sizes of our data structures are three times smaller than those of ordinary ZDDs, and membership operations and random sampling on DenseZDDs are about ten times and three times faster than those on ordinary ZDDs for some datasets, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.