The 64b PowerPC RISC microprocessor previously described is migrated from a 0.22µm SOI technology to a 0.18µm SOI technology [1]. Key features of the 0.77 scaled 1.5V technology are 0.08µm NFET channel lengths, 7 layer Cu metallization with low-e dielectric, low dose SOI substrate for improved material quality and productivity, and local interconnect. Dual gate oxide provides high I/O voltage compatibility. As this chip is a migration only 6 levels of metal and stacked devices for high voltage I/O were used.
Silicon-on-insulator (SOI) technology allows higher performance than bulk technology. However, the floating body effect in SOI devices poses challenges via history effects, bipolar currents, and lower noise margins on dynamic circuits. This 64b adder is used to compute the effective address in a PowerPC TM processor. Particular emphasis is on design issues, advantages resulting from unique SOI device structures, and the techniques for controlling floating body effect in partially-depleted devices. Adder performance comparison is shown for bulk CMOS, first-generation SOI CMOS, and-second generation SOI CMOS.High-speed microprocessor cycle time is limited by a number of critical paths, one of which involves a cache memory access initiated by a load or store instruction. For instance, an instruction of the form: lwaux RT, RC, RB (PowerPC TM load word Algebraic with update indexed) requires the value in register RC to be added to the value in register RB to produce the effective address (EA). The effective address is used to access the translation lookaside buffer (TLB) and the cache memory (Figure 17.1.1).
A double-precision multiplier for floating-point and mediastreaming instructions in the first-generation CELL processor [1] on 90nm PD/SOI is reported. Multiplication by recoding and successive partial-product (PP) compression is completed in three 11FO4 cycles including merging with the aligner. Figure 20.3.3 shows the micro-architecture of the design. At 1.3V and 68°C, hardware runs at 4.76GHz (Fig. 20.3.1). The multiplier area is 0.19mm 2 including that of decoupling capacitors. Only regular-V t devices are used in consideration of variability, leakage, and scalability. Other noted high-speed design points in the 90nm technology are the single precision [2] and low FO4 double-precision [3] multipliers.The first cycle starts with Radix-4 Booth logic whose inputs are two 53b operands. Booth circuits reduce the number of PP rows to 27. To minimize area and latch count, two levels of 3:2 compressions in transmission-gate (TG) style circuits are also performed in this cycle. Footless domino circuits are used for complex Booth encoding and muxing functions. Figure 20.3.4 depicts a pruned schematic diagram for the Booth encoder, Booth multiplexer (MUX), and pulse-to-static converter latch.Static cycle 2 and 3 start with low-latency pulse latches (12 unfolded and 8 folded PP rows, respectively) to maximize cycletime utilization and minimize clock power. Cycle 2 contains thirdlevel 4:2 compressors (CMPs) and fourth-level 3:2 CMPs. In the third cycle, the fifth-level 4:2 CMP outputs are merged with the outputs from aligners in the final 3:2 CMPs. To ensure noise immunity, no unbuffered TGs are used. Delay is reduced through customized connections between two compression levels such that the number of inversions in any given path is minimized. Interconnect penalties are minimized by splitting the wiring between the second (row folding wires) and third (buses over the aligner) cycle. Figure 20.3.5 shows exemplary 3:2 and 4:2 TG CMPs.Input operand latches convert static inputs to clock-qualified signals for the domino stages. Booth encoders are placed in the central clock bay to minimize delay. Pulsed operand inputs to dynamic stages reduce contention current at various process and operating corners. The design tolerates 10% variation in system clock pulses, i.e., 40% evaluate or precharge duty cycle, thus enhancing the technology and frequency scalability. Besides PFET keepers for dynamic nodes, clock gated NFET keeper devices are incorporated to sustain the low state, thus allowing low-speed testing and operations under short evaluate pulse conditions. Additionally, a pulse limiter on the clock grid limits evaluation time to 20FO4 at long cycle time. This avoids keeping dynamic nodes in the evaluate state for long periods of time. Higher leakage and smaller keepers can thus be tolerated without failure. Long Booth-encoder output wires and ladder-style Booth MUX input connections are shielded from noise. Dynamic output signals are converted to static ones with a mid-cycle converter latch whose input clock is delay inte...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.