Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture 2011
DOI: 10.1145/2155620.2155675
|View full text |Cite
|
Sign up to set email alerts
|

A compile-time managed multi-level register file hierarchy

Abstract: As processors increasingly become power limited, performance improvements will be achieved by rearchitecting systems with energy efficiency as the primary design constraint. While some of these optimizations will be hardware based, combined hardware and software techniques likely will be the most productive. This work redesigns the register file system of a modern throughput processor with a combined hardware and software solution that reduces register file energy without harming system performance. Throughput… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
39
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 62 publications
(43 citation statements)
references
References 29 publications
2
39
0
Order By: Relevance
“…To improve on the register energy efficiency, we adopt a compiler controlled hierarchical register file implementation [23], [24]. In our design, the operand register file (ORF) contains just 8 entries per thread, while each thread may have 32-256 registers in the main register file (MRF).…”
Section: B Throughput Optimized Core Architecturementioning
confidence: 99%
“…To improve on the register energy efficiency, we adopt a compiler controlled hierarchical register file implementation [23], [24]. In our design, the operand register file (ORF) contains just 8 entries per thread, while each thread may have 32-256 registers in the main register file (MRF).…”
Section: B Throughput Optimized Core Architecturementioning
confidence: 99%
“…Gebhart et al explore several register allocation algorithms and propose a compiler specifiable register file hierarchy that allows sharing of temporary register file resources among running threads, reducing the usage of this energy hogging resource [19], [12]. They also propose a unified scratch, register and primary cache that can be configured at runtime to minimize the access latencies [20].…”
Section: Related Workmentioning
confidence: 99%
“…Gebhart et al [4] proposed register file caching and two-level thread scheduler to reduce the number of reads and writes to the large main register file and save its dynamic energy. The authors further extended their work to the compiler level and explored register allocation algorithms to improve register energy efficiency [5]. Yu et al integrated embedded DRAM and SRAM cells to reduce area and energy [3].…”
Section: Related Workmentioning
confidence: 99%
“…For example, Nvidia Fermi GPU supports more than 20,000 parallel threads and contains 2MB register files [7]. Accessing such sizeable register files leads to massive power consumption [2][3][4][5][6]. It has been reported that the register files consume 15%-20% of the GPU stream multiprocessor's power [8].…”
Section: Introductionmentioning
confidence: 99%