2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) 2018
DOI: 10.1109/micro.2018.00037
|View full text |Cite
|
Sign up to set email alerts
|

FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…GPUs have dozens of SMs, and the TB scheduler assigns TBs to SMs based on the scheduling policy. The number of assigned TBs is determined by thread count limit, shared memory size, and register file size [20], [25], [26].…”
Section: B Baseline Gpu Architecturementioning
confidence: 99%
See 3 more Smart Citations
“…GPUs have dozens of SMs, and the TB scheduler assigns TBs to SMs based on the scheduling policy. The number of assigned TBs is determined by thread count limit, shared memory size, and register file size [20], [25], [26].…”
Section: B Baseline Gpu Architecturementioning
confidence: 99%
“…Three factors determine the number of threads that can be scheduled on SMs: the size of register file, the size of shared memory, and the maximum thread count limit [20], [25], [26]. TBs are scheduled on SMs until reaching one of the limitations.…”
Section: Thread Context-aware Register Cachementioning
confidence: 99%
See 2 more Smart Citations
“…RegMutex [25] improved performance by sharing a subset of physical registers between warps during the GPU kernel execution. FineReg [42] achieved a higher number of concurrent CTAs by partitioning the register ile into two regions, one for active CTAs and another for pending CTAs. Register ile slicing [20] proposed to split the data path into two 16-bit slices, which enables the register to save power by power gating a slice if storing narrow-values, or to improve performance by fetching two 16-bit values.…”
Section: Related Workmentioning
confidence: 99%