2012 45th Annual IEEE/ACM International Symposium on Microarchitecture 2012
DOI: 10.1109/micro.2012.18
|View full text |Cite
|
Sign up to set email alerts
|

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor

Abstract: Modern throughput processors such as GPUs employ thousands of threads to drive high-bandwidth, long-latency memory systems. These threads require substantial on-chip storage for registers, cache, and scratchpad memory. Existing designs hard-partition this local storage, fixing the capacities of these structures at design time. We evaluate modern GPU workloads and find that they have widely varying capacity needs across these different functions. Therefore, we propose a unified local memory which can dynamicall… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
59
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 94 publications
(61 citation statements)
references
References 20 publications
(24 reference statements)
2
59
0
Order By: Relevance
“…Our work mainly considers L1 cache and our bypass policy is based on reuse distance prediction. A unified GPU on-chip memory design is proposed by Gebhart et al [14] to satisfy varying capacity needs across different applications. LLC management policies for 3D scene rendering workloads on GPUs are explored by Gaur et al [13], while our work focuses on general purpose applications.…”
Section: B Gpu Cache Managementmentioning
confidence: 99%
“…Our work mainly considers L1 cache and our bypass policy is based on reuse distance prediction. A unified GPU on-chip memory design is proposed by Gebhart et al [14] to satisfy varying capacity needs across different applications. LLC management policies for 3D scene rendering workloads on GPUs are explored by Gaur et al [13], while our work focuses on general purpose applications.…”
Section: B Gpu Cache Managementmentioning
confidence: 99%
“…TSIMT register files use the same basic design idea as the register files of conventional GPUs: Instead of using costly multiported memories, multiple single ported SRAM banks are used [Lindholm et al 2008b;Gebhart et al 2012]. These register banks are connected using a crossbar to a operand collector.…”
Section: Register Filementioning
confidence: 99%
“…We implement a malleable memory system proposed by Gebhart et al that allows flexible use of on-chip SRAM to optimize energy efficiency [25]. Rather than having a fixed pool or registers per thread and cache-capacity per thread or compute cluster, malleable memory allows the compiler to identify and expose the number of registers that will be needed for any given kernel execution.…”
Section: B Throughput Optimized Core Architecturementioning
confidence: 99%