In 2001, IBM delivered to the marketplace a high-performance UNIX ® -class eServer based on a four-chip multichip module (MCM) code named Regatta. This MCM supports four POWER4 chips, each with 170 million transistors, which utilize the IBM advanced copper back-end interconnect technology. Each chip is attached to the MCM through 7018 flip-chip solder connections. The MCM, fabricated using the IBM high-performance glass-ceramic technology, features 1.7 million internal copper vias and high-density topsurface contact pad arrays with 100-m pads on 200-m centers. Interconnections between chips on the MCM and interconnections to the board for power distribution and MCM-to-MCM communication are provided by 190 meters of co-sintered copper wiring. Additionally, the 5100 off-module connections on the bottom side of the MCM are fabricated at a 1-mm pitch and connected to the board through the use of a novel land grid array technology, thus enabling a compact 85-mm ؋ 85-mm module footprint that enables 8-to 32-way systems with processors operating at 1.1 GHz or 1.3 GHz. The MCM also incorporates advanced thermal solutions that enable 156 W of cooling per chip. This paper presents a detailed overview of the fabrication, assembly, testing, and reliability qualification of this advanced MCM technology.
Power management through dynamic voltage and frequency scaling (DVFS) is one of the most widely adopted techniques. However, it impacts application reliability (due to soft errors, circuit aging, and deadline misses). However, increased power density impacts the thermal reliability of the chip, sometimes leading to permanent failure. To balance both application- and thermal-reliability along with achieving power savings and maintaining performance, we propose application- and thermal-reliability-aware reinforcement learning–based multi-core power management in this work. The proposed power management scheme employs a reinforcement learner to consider the power savings and variations in the application and thermal reliability caused by DVFS. To overcome the computational overhead, the power management decisions are determined at the application-level rather than per-core or system-level granularity. Experimental evaluation of proposed multi-core power management on a microprocessor with up to 32 cores, running PARSEC applications, was done to demonstrate the applicability and efficiency of the proposed technique. Compared to the existing state-of-the-art techniques, the proposed technique enables an average energy savings of up to ∼20%, up to 4.926°C temperature reduction without degradation in the application- and thermal-reliability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.