The increase in on-chip core counts in Chip MultiProcessors (CMPs) has led to the adoption of interconnects such as Mesh and Torus, which consume an increasing fraction of the chip power. Moreover, as technology and voltage continue to scale down, static power consumes a larger fraction of the total power; reducing it is increasingly important for energy proportional computing. Currently, processor designers strive to send under-utilized cores into deep sleep states in order to reduce idling power and improve overall energy efficiency. However, even in state-of-the-art CMP designs, when a core goes to sleep the router attached to it remains active in order to continue packet forwarding. In this paper, we propose Router Parking -selectively power-gating routers attached to parked cores. Router Parking ensures that network connectivity is maintained, and limits the average interconnect latency impact of packet detouring around parked routers. We present two Router Parking algorithms -an aggressive approach to park as many routers as possible, and a conservative approach that parks a limited set of routers in order to keep the impact on latency increase minimal. Further, we propose an adaptive policy to choose between the two algorithms at run-time. We evaluate our algorithms using both synthetic traffic as well as real workloads taken from SPEC CPU2006 and PARSEC 2.1 benchmark suites. Our evaluation results show that Router Parking can achieve significant savings in the total interconnect energy (average of 32%, 40% and 41% for the synthetic, SPEC CPU2006, and PARSEC 2.1 workloads, respectively).
Computation at the edge of a datacenter has unique characteristics. It deals with streaming data from multiple sources, going to multiple destinations, often requiring repeated application of one or more of several standard algorithmic kernels. These kernels, related to encryption, compression, XML Parsing and regular expression searching on the data, demand a high data processing rate and power efficiency. This suggests the use of hardware acceleration for key functions. However, robust general purpose processing support is necessary to orchestrate the flow of data between accelerators, as well as perform tasks that are not suited to acceleration. Further, these accelerators must be tightly integrated with the general purpose computation in order to keep invocation overhead and latency low. The accelerators must be easy for software to use, and the system must be flexible enough to support evolving networking standards.In this article, we describe and evaluate the architecture of IBM’s PowerEN processor, with a focus on PowerEN’s architectural enhancements and its on-chip hardware accelerators.PowerEN unites the throughput of application-specific accelerators with the programmability of general purpose cores on a single coherent memory architecture. Hardware acceleration improves throughput by orders of magnitude in some cases compared to equivalent computation on the general purpose cores. By offloading work to the accelerators, general purpose cores are freed to simultaneously work on computation less suited to acceleration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.