Resistive RAM (RRAM) is a promising emerging Non-Volatile Memory candidate due to its scalability and CMOS compatibility, which enables the fabrication of high density RRAM crossbar arrays in Back-End-Of-Line CMOS processes. Fast and accurate architectural models of RRAM crossbar devices are required to perform system level design space explorations of new Storage Class Memory (SCM) architectures using RRAM e.g. Non-Volatile-DIMM-P (NVDIMM-P). The major challenge in architectural modeling is the trade-off between accuracy and computing intensity. In this paper we present RRAMSpec, an architecture design space exploration framework, which enables fast exploration of various architectural trade-offs in designing high density RRAM devices, at accuracy levels close to circuit level simulators. The framework estimates silicon area, timings, and energy for RRAM devices. It outperforms stateof-the-art RRAM modeling tools by conducting architectural explorations at very high accuracy levels within few seconds of execution time. Our evaluations show various trade-offs in designing RRAM crossbar arrays with respect to array sizes, write time and write energy. Finally we present the influence of technology scaling on different RRAM design trade-offs.
Heterogeneous accelerator enhanced computing architectures are a common solution in embedded computing, mainly due to the constraints in energy and power efficiency. Such accelerator enhanced systems dispatch data- and computing-intensive tasks to specialized, optimized and thus efficient hardware units, leaving most control flow tasks for the more generic but less efficient central processing units (CPUs). Nowadays, also high-performance computing (HPC) systems are becoming more heterogeneous by incorporating accelerators into the computing nodes.In this chapter, we introduce the concept of heterogeneous computing and present the design of a hardware accelerator for solving the Link Assessment (LA) problem, in introduced Chapter 3. The hardware accelerator integrates its main dedicated processing units with a customized cache design and light-weight data path. We provide detailed area, energy, and timing results for a 28 nm application specific integrated circuit (ASIC) process and DDR3 memory devices. Compared to an CPU-based cluster, our proposed solution uses 38x less memory and is 1030x more energy efficient for processing a users-movies dataset with half a million edges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.