2011 IEEE International Parallel &Amp; Distributed Processing Symposium 2011
DOI: 10.1109/ipdps.2011.420
|View full text |Cite
|
Sign up to set email alerts
|

Power, Programmability, and Granularity: The Challenges of ExaScale Computing

Abstract: Reaching an ExaScale computer by the end of the decade, and enabling the continued performance scaling of smaller systems requires significant research breakthroughs in three key areas: power efficiency, programmability, and execution granularity. To build an ExaScale machine in a power budget of 20MW requires a 200-fold improvement in energy per instruction: from 2nJ to 10pJ. Only 4x is expected from improved technology. The remaining 50x must come from improvements in architecture and circuits. To program a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
27
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(27 citation statements)
references
References 0 publications
0
27
0
Order By: Relevance
“…These models enable the analysis of data transfer between two levels of the memory hierarchy. Lower data transfer complexity implies better data locality and, therefore, higher energy efficiency since energy consumption caused by data transfer dominates the total energy consumption [18].…”
Section: Bounded Ideal Cache Modelmentioning
confidence: 99%
See 2 more Smart Citations
“…These models enable the analysis of data transfer between two levels of the memory hierarchy. Lower data transfer complexity implies better data locality and, therefore, higher energy efficiency since energy consumption caused by data transfer dominates the total energy consumption [18].…”
Section: Bounded Ideal Cache Modelmentioning
confidence: 99%
“…Unlike conventional locality-aware data structures and algorithms that only concern whether the data is on-chip (e.g., in cache) or not (e.g., in DRAM), new energy-efficient data structures and algorithms must consider data locality in finer-granularity: where on chip the data is 1 . It is estimated that for chips using the 10nm technology, the energy gap between accessing data in nearby on-chip memory (e.g., data in SRAM) and accessing data across the chip (e.g., on-chip data at the distance of 10mm), will be as much as 75x (2pJ versus 150pJ), whereas the energy gap between accessing on-chip data and accessing off-chip data (e.g., data in DRAM) will be only 2x (150pJ versus 300pJ) [18]. Therefore, in order to construct energy efficient software systems, data structures and algorithms should support not only high parallelism but also 1.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…They have been adopted in many supercomputers, e.g., Titan, Stampede, and Tianhe-2, mainly for two purposes: (1) improving the performance, and (2) reducing the overall power consumption [1]. As GPUs are becoming ubiquitous in HPC, numerous applications have been ported to GPUbased systems over the past several years, including large scale scientific applications on GPU clusters [2]- [4].…”
Section: Introductionmentioning
confidence: 99%
“…With each level of the memory hierarchy that a data transfer crosses (e.g. between on-chip caches, or from last-level cache to DRAM), the energy consumption of the transfer increases by one order of magnitude or more [4]. The memory hierarchy remains the most important performance factor in computing systems, as latency keeps lagging bandwidth [5].…”
Section: Introductionmentioning
confidence: 99%