“…In terms of granularity, accelerators can exploit bit-wise sparsity via bit-serial computation [1,31], unstructured element-wise sparsity of either activations or weights [2,5,6,8,11,20,29,38], or structured sparsity via a co-designed pruning algorithm [17,37,41]. BitPruner [39] applies structured bit-wise pruning to benefit bit-serial architectures. Our approach also falls under the structured pruning category, but with one key distinction: the pruning framework is closely designed with the dataflow.…”