2013
DOI: 10.1145/2508148.2485952
|View full text |Cite
|
Sign up to set email alerts
|

An energy-efficient and scalable eDRAM-based register file architecture for GPGPU

Abstract: The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional S-RAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new challenges. Per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 26 publications
0
6
0
Order By: Relevance
“…GPU architecture. Other related work optimizes various aspects of the GPU architecture, e.g., warp scheduling [33], [39], [43], [56], [66], L1 cache management [31], [59], [65], [68], register file design [3], [30], [32], NoC optimization [10], [35], [73], [77], [78], and SM resource virtualization [64], [72]. Recent work also provides approaches for efficient multitasking in GPUs [4], [52], [53], [62], [67], [71], [76], virtual memory management [9], and design considerations for multi-module GPUs [8], [45].…”
Section: Related Workmentioning
confidence: 99%
“…GPU architecture. Other related work optimizes various aspects of the GPU architecture, e.g., warp scheduling [33], [39], [43], [56], [66], L1 cache management [31], [59], [65], [68], register file design [3], [30], [32], NoC optimization [10], [35], [73], [77], [78], and SM resource virtualization [64], [72]. Recent work also provides approaches for efficient multitasking in GPUs [4], [52], [53], [62], [67], [71], [76], virtual memory management [9], and design considerations for multi-module GPUs [8], [45].…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, the RAT cannot only be used to determine whether a register is allocated in register file or scratchpad memory but also to calculate the register addresses in scratchpad memory. When allocating a scratchpad memory region for a CTA (CTA_ID), the value of SBR of that region is calculated using Equation (11). In Equation 11, S is the capacity of scratchpad memory:…”
Section: Register Allocationmentioning
confidence: 99%
“…Although Gebhart et al [7] propose an unified on-chip memory structure that the capacity of register file, scratchpad memory, and L1 cache can be partitioned at runtime according to the requirement of applications in a fine-grained way, there are still two shortcomings. First, the unified structure lacks flexibility; register file is one of the main contributors to GPU energy consumption and various power saving technologies [11,14,23,[32][33][34] are proposed for register file to save energy, which can be hard to apply to the unified structure due to the different access characteristics between register file and L1 cache. Second, the unified structure increases bank conflicts between register file, scratchpad memory and L1 cache; they use software-managed hierarchical register file [6] to reduce the required bandwidth to the main register file, however, that technology focuses on energy efficiency and may lead to resource underutilization and suboptimal performance [29,35].…”
Section: Evaluation For Advanced Architecturementioning
confidence: 99%
“…(1) DVFS (dynamic voltage/frequency scaling)-based techniques Jiao et al 2010;Lee et al , 2011Ma et al 2012;Cebrian et al 2012;Sheaffer et al 2005b;Chang et al 2008;Ren 2011;Anzt et al 2011;Ren et al 2012;Zhao et al 2012;Huo et al 2012;Keller and Gruber 2010;Abe et al 2012;Park et al 2006;Paul et al 2013] (2) CPU-GPU workload division-based techniques [Takizawa et al 2008;Rofouei et al 2008;Ma et al 2012;Hamano et al 2009] and GPU workload consolidation (3) Architectural techniques for saving energy in specific GPU components, such as caches Lee et al 2011;Lashgar et al 2013;Arnau et al 2012;Rogers et al 2013;Lee and Kim 2012], global memory [Wang et al 2013;Rhu et al 2013], pixel shader [Pool et al 2011], vertex shader [Pool et al 2008], core data path, registers, pipeline and thread scheduling Chu et al 2011;Gebhart et al 2011;Jing et al 2013 We now discuss these techniques in detail. As seen through the previous classification, several techniques can be classified into more than one group.…”
Section: Overviewmentioning
confidence: 99%