Die-stacked DRAM is a technology that will soon be integrated in high-performance systems. Recent studies have focused on hardware caching techniques to make use of the stacked memory, but these approaches require complex changes to the processor and also cannot leverage the stacked memory to increase the system's overall memory capacity. In this work, we explore the challenges of exposing the stacked DRAM as part of the system's physical address space. This non-uniform access memory (NUMA) styled approach greatly simplifies the hardware and increases the physical memory capacity of the system, but pushes the burden of managing the heterogeneous memory architecture (HMA) to the software layers. We first explore simple (and somewhat impractical) schemes to manage the HMA, and then refine the mechanisms to address a variety of hardware and software implementation challenges. In the end, we present an HMA approach with low hardware and software impact that can dynamically tune itself to different application scenarios, achieving performance even better than the (impractical-to-implement) baseline approaches.
have demanded increasingly powerful computer systems. High-performance computing (HPC) has steadily pushed supercomputers to greater computational capabilities, with petascale computing (10 15 floating-point operations per second [flops]) first being achieved in 2008. 1 The current fastest supercomputer is capable of 33.86 petaflops (see www.top500 .org). These HPC systems perform massive computations to drive important scientific experiments leading to impactful discoveries, from designing more efficient fuels and engines to engineering safer bridges and buildings, from modeling global climate phenomena to exploring the origins of the universe.The next HPC milestone is that of developing an exascale supercomputer that can achieve more than an exaflop (10 18 or one billion, billion flops) on critical scientific computing applications. Such exascale supercomputers are envisioned to comprise at least 100,000 interconnected servers or nodes, implying that each node has an individual computing capability of greater than 10 teraflops (Tflops) on real applications. Note that a modern high-end discrete GPU today achieves a peak of only about three doubleprecision teraflops.The challenges associated with exascale computing, however, extend far beyond merely achieving a certain number of floatingpoint calculations per second. To feed such high levels of computation, both the system's memory and internode communication bandwidths must increase dramatically beyond current levels. The energy efficiency of the supercomputer must improve by orders of magnitude to enable the operation within practical datacenter power-delivery capabilities of a few tens of megawatts. With more than 100,000 nodes, significant advances in resilience and reliability are also required to keep the overall machine up and running. In this article, we describe AMD Research's vision for exascale computing, and in particular, how we see heterogeneous computing as the path forward. AMD's vision for exascale computingAn exascale system consists of the hardware implementing the computational resources, as well as the software needed to efficiently write, tune, and execute applications on the hardware.
Processing-in-Memory (PIM) is the concept of moving computation as close as possible to memory. This decreases the need for the movement of data between central processor and memory system, hence improves energy efficiency from the reduced memory traffic. In this paper we present our approach on how to embed processing cores in 3D-stacked memories, and evaluate the use of such a system for Big Data analytics. We present a simple server architecture, which employs several energy efficient PIM cores in multiple 3D-DRAM units where the server acts as a node of a cluster for Big Data analyses utilizing MapReduce programming framework. Our preliminary analyses show that on a single node up to 23% energy savings on the processing units can be achieved while reducing execution time by up to 8.8%. Additional energy savings can result from simplifying the system memory buses. We believe such energy efficient systems with PIM capability will become viable in the near future because of the potential to scale the memory wall.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.