We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35-55% improvement over gzip with negligible slowdown; the off-line reordering provides up to 20% further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.
We present a new linear-time algorithm to find the immediate dominators of all vertices in a flowgraph. Our algorithm is simpler than previous linear-time algorithms: rather than employ complicated data structures, we combine the use of microtrees and memoization with new observations on a restricted class of path compressions. We have implemented our algorithm, and we report experimental results that show that the constant factors are low. Compared to the standard, slightly superlinear algorithm of Lengauer and Tarjan, which has much less overhead, our algorithm runs 10-20% slower on real flowgraphs of reasonable size and only a few percent slower on very large flowgraphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.