The Burrows-Wheeler Transform (BWT) produces a permutation of a string X, denoted X , by sorting the n cyclic rotations of X into full lexicographical order and taking the last column of the resulting n n matrix to be X . The transformation is reversible in O.n/ time. In this paper, we consider an alteration to the process, called k-BWT, where rotations are only sorted to a depth k. We propose new approaches to the forward and reverse transform, and show that the methods are efficient in practice. More than a decade ago, two algorithms were independently discovered for reversing k-BWT, both of which run in O.nk/ time. Two recent algorithms have lowered the bounds for the reverse transformation to O.n log k/ and O.n/, respectively. We examine the practical performance for these reversal algorithms. We find that the original O.nk/ approach is most efficient in practice, and investigates new approaches, aimed at further speeding reversal, which store precomputed context boundaries in the compressed file. By explicitly encoding the context boundaries, we present an O.n/ reversal technique that is both efficient and effective. Finally, our study elucidates an inherently cache-friendly -and hitherto unobserved -behavior in the reverse k-BWT, which could lead to new applications of the k-BWT transform. In contrast to previous empirical studies, we show that the partial transform can be reversed significantly faster than the full transform, without significantly affecting compression effectiveness. approach in which the n rotations are only partially sorted to a fixed prefix depth, k. We refer to this modified transform as k-BWT. By limiting the sort depth to k, sorting can be accomplished in O.nk/ time using radix sort and is very fast in practice. Moreover, Schindler reports nearly identical compression effectiveness to the full transform, even for small values of k. The algorithms developed by Schindler [13] were subsequently made available in the general purpose compression tool szip. However, the simplification of the forward k-BWT transform comes at a cost: the reverse transform becomes more expensive, at least in theory.Our contribution: First, we describe an efficient forward k-BWT algorithm based on induced sorting techniques from suffix array construction [15]. Our second contribution is a practical, O.n/ k-BWT time reversal algorithm that implicitly stores context boundaries. Third, we provide the first thorough empirical analysis of state-of-the-art k-BWT algorithms for the forward and inverse transforms, compression effectiveness, and associated trade-offs. Lastly, we discover a previously undocumented locality of access property inherent to k-BWT algorithms, allowing fast transform reversal for small k.
BACKGROUND AND NOTATIONLet X D XOE0..n D XOE0XOE1..XOEn be a string (or text) of n C 1 symbols, where the first n symbols of X are drawn from an alphabet † and comprise the actual input; XOEn D $ is a unique "endof-string" symbol that is defined to be lexicographically smaller than all symbols in †. The stri...