Peripheral I/O data-rates for PCs and mobile computing platforms continue to scale to meet high-bandwidth applications including high-resolution displays and large-capacity external storage. The bandwidth requirements will soon exceed the data-rates of current standards such as PCI Express and USB. A lowpower low-cost serial link is needed for the next-generation peripheral interface that can scale to 32Gb/s per lane. Recent publications have demonstrated 28 to 32Gb/s rates [1][2]. However, the circuit power and channel characteristics are not suitable for mainstream PC and mobile markets. A low-profile connector and cable assembly prototype is developed for these markets, where the link architecture and design are optimized for the channel characteristics. This paper describes a data-rate-scalable 32Gb/s serial link that features a bidirectional transceiver, source-series terminated (SST) 3-tap FFE, a continuous-time linear equalizer (CTLE) with an active inductor, a 6-tap DFE, and clock calibration and adaptation circuitry. Figure 26.2.1(a) illustrates the interconnect topology. The primary components are PCBs, packages, connectors and cable. The 8-lane cable assembly consists of consumer-grade 32AWG shielded twisted pair copper wires. The height and width of the connector are 3mm and 13mm, respectively. The bidirectional transceiver architecture is shown in Fig. 26.2.1(b). An LC-VCO-based PLL generates a quarter-rate clock that is distributed to each transceiver lane using regulated CMOS buffers similar to [3]. In order to save power, the transceivers are grouped into 4 lanes to form a bundle. The DLL and clock multiplier are common to the 4 lanes and are included in the bundle clock circuitry. The TX is based on a half-rate clock architecture, while the RX uses a quarter-rate clock architecture. After the DLL, the quarter-rate clock is multiplied to half-rate and distributed locally to 4 TXs. The output stage is a low-swing segmented SST driver using NMOS devices for channel termination. 3-tap TX pre-emphasis is implemented using switching devices to short differential outputs similar to [4]. The RX front-end consists of a CTLE and 6-tap DFE. The CTLE has an active inductor to provide up to 4dB of peaking while minimizing silicon area [5]. Other than the LC-VCO, no passive inductors are used for bandwidth extension or I/Opad-capacitance reduction. A quadrature clock generator (Q-Gen) provides I/Q clocks for the 4-way interleaved DFE. Across <12dB loss channels and up to 12Gb/s, a separate 2-way interleaved low-power (LP) latch with 2× oversampled CDR is used to reduce power consumption by disabling the CTLE and DFE. There are two separate CDR logic blocks, one in the lane and one in the bundle, that independently control the quadrature phase interpolator (PI). The bundle CDR aggregates the phase-recovery information from all 4 lanes.To minimize the area and pad capacitance of the bidirectional transceiver, the SST TX can be configured to be RX termination as shown in Fig. 26.2.2. When functioning as an output...