Increasing demand for parallelism due to out-of-order and multi-threading computation requires fast and dense arrays with multi-port capabilities. The loadstore-unit (LSU) of the POWER7 TM microprocessor core has a 32kB L1 data cache composed of four 8kB blocks. In a two-cycle back-to-back operation it supports concurrently two independent read and one write operations. Organized in banks of 16 cells each, the two reads operate independently in any of these banks, including two reads within the same bank, even the same cell. A bank selected for write is blocked for any read operation. If read and write collide within the same bank, collision-control circuitry provides write-over-read priority. Each read port provides 4B from 1 of 256 locations, whereas the double-bandwidth write operation provides individual control of 8B to 128 locations. Figure 19.2.1 shows the back-to-back data cache loop. The two operand muxes select between the general purpose registers (GPR), the feedback loop and other read port bypass operands, the result goes into an adder stage that generates the read addresses (AGEN). The array output data passes through a formatter stage and then the result is driven back to the operand mux inputs. The cycle boundary at the array macro input is balanced between the two cycles to optimize the operating frequency, which is effectively determined the whole back-toback loop rather than by the actual data cache access. Figure 19.2.2 shows the read/write-decoding scheme using a standard 6T-SRAM cell in an effectively triple-port array. In an 8kB instance, the 256 entries are grouped into 16 banks of 16 6T-SRAM cells each. The 0.462µm 2 SRAM cell drives a low bitline (BL) load and is optimized for performance; the two passgate devices are connected to separate wordlines (WLs), wl_t and wl_c, and local BLs, blt and blc. Single-ended reads are initiated by activating a WL connected to one of the pass-gates. For a write operation both WLs of a given cell are active for a differential write. The two-stage decoder is organized in a bank select (msb) and a row select within a bank (lsb). In a read/write-bank-control stage the collision case is handled. If a bank m is not selected for a write (wr_msb
An Instruction Window Buffer (IWB) addresses the challenges in microprocessor designs beyond a GHz. The IWB implements the processor parts for renaming, reservation station and reorder buffer as a unified buffer. Measured results on an experimental chip demonstrate operation of the IWB macros at 1.8GHz, with the chip at the fast end of the process distribution. The technology is 0.18µm CMOS8S bulk technology, with 7 levels of copper interconnect and a 1.5V supply. The IWB is implemented using static and delayed reset dynamic circuit macros [1].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.