We propose two orthogonal approaches to logarithmic cyclic shifter design. The first method, called fanout splitting, replaces multiplexers in a conventional design with demultiplexers which have two fanouts driving the shifting and non-shifting paths separately. The use of demultiplexers has a two-fold effect; it cuts the accumulated wire load on the critical path from O(N log 2 (N )) to O(N ), and reduces the switching probabilities on the inter-stage long wires from 1/4 to 3/16. We then perform cell order optimization to further improve the delay, and formulate it as an integer linear programming problem. For the 64-bit case, the two approaches together reduce the total delay by 67.1% and dynamic power consumption by 17.6%, respectively.