With the escalation of clock frequencies and the increasing ratio of wire-to gate-delays, clock skew is a major problem to be overcome in tomorrow's high-speed VLSI chips. Also, with an increasing number of stages switching simultaneously comes the problem of higher peak power consumption. In our past work, we have proposed a novel scheme called Counter ow-Clocked(C 2 ) Pipelining to combat these problem, and discussed methods for composing C 2 pipelined stages. In this paper, we analyze, in great detail, the timing constraints to be o b eyed in designing basic C 2 pipelined stages as well as in composing C 2 pipelined stages. C 2 pipelining is well suited for systems that exhibit mostly uni-directional data ows as well as possess mostly nearest-neighbor connections.We illustrate C 2 pipelining on such a design with several design examples. C 2 pipelining eases the distribution of high speed c l o cks, shortens the clock period by eliminating global clock signals, allows natural use of level-sensitive dynamic latches, and generates less internal switching noise due to the uniformly distributed latch operation. By applying C 2 pipelining and its composition methods to build a system, VLSI designers can substitute the global clock skew problem with many local one-sided delay constraints.With the escalation of clock frequencies and the increasing ratio of wire-to gate-delays, clock skew is a major problem to be overcome in today's high-speed VLSI chips. Clock s k ew should ideally be less than 5-10% of the system clock cycle time 1] this is a di cult gure to attain in many modern chips 2] and will become more so with the impending GHz rate of clocking 3]. The e ect of shrinking VLSI feature sizes will increase this disparity 4] in the future, especially in the light of the fact that in submicron CMOS, interconnection delays are going to be larger than gatepropagation delays 5]. Consequently, an increased percentage of the clock period will be devoted to clock s k ew margins 6, 7]. The faster the clock and the bigger the die size, the worse the clock skew e ects will be.A major concern when building high performance VLSI systems is to build an e ective clock distribution network. Many clock distribution methods for large high-speed VLSI chips have b e e n developed 1] t o a c hieve rigid synchronization (tight s k ew control) over the chip. Clock distribution networks of high-speed systems are normally comprised of binary trees of clock bu ers 2, 8], which are expensive to produce in terms of area and design time. Network implementations such a s H-tree methods 7] h a ve been commonly exploited to reduce the clock s k ew. The e ort to limit skews has an unfortunate side-e ect: it causes the latches to switch almost simultaneously, causing ground-bounce and power-supply-droop, both of which can lead to chip malfunction. This often necessitates on-chip and o -chip decoupling capacitors 1], both of which add to the design cost.Rigidly clocked synchronous systems are often those that support a variety of data movements b...