2011
DOI: 10.1145/2024723.2000093
|View full text |Cite
|
Sign up to set email alerts
|

Energy-efficient mechanisms for managing thread context in throughput processors

Abstract: Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
68
0
1

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 65 publications
(72 citation statements)
references
References 26 publications
3
68
0
1
Order By: Relevance
“…It only induces 1.5% performance overhead based on our evaluation across a large set of GPGPU benchmarks (detailed experimental methodologies are described in Section 4.1. ), which also matches the observation made in [4]. During the register renaming stage, the destination register ID is renamed to a free physical register.…”
Section: Memory Contention-aware Tfet Register Allocationsupporting
confidence: 79%
See 2 more Smart Citations
“…It only induces 1.5% performance overhead based on our evaluation across a large set of GPGPU benchmarks (detailed experimental methodologies are described in Section 4.1. ), which also matches the observation made in [4]. During the register renaming stage, the destination register ID is renamed to a free physical register.…”
Section: Memory Contention-aware Tfet Register Allocationsupporting
confidence: 79%
“…The output is written back to the counter, it will be read when the warp enters into the pipeline, and a larger-than-one value in the counter implies the necessity of writing to the TFET-based register. In [4], Gebhart et al found that 70% of the register values are read only once in GPGPU workloads. It implies that most TFET register values are read once, therefore, renaming the destination register to the TFET register usually causes 2-cycle extra delay: one additional cycle during the value write back, and another one when it is read by a subsequent instruction.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another reason for warplevel divergence is due to warp scheduling policies which may prioritize some warps over the others in a TB. For example, the recently proposed two-level scheduling [9][17] tries to better overlap memory access latency with computations by intentionally making some warps runs somewhat faster than others. In Figure 12, we compare the impact from two scheduling policies, round robin (labeled as 'RR') and two-level (labeled as '2L').…”
Section: Program-dependent Workload Imbalancementioning
confidence: 99%
“…La politique d'ordonnancement des warps considérée consiste à sélectionner l'instruction prête la plus agée. Nous considérons une mémoire limitée en débit et de latence fixée, suivant la méthodologie observée par (Gebhart et al, 2011). Nous calibrons les paramètres du modèle d'après les résultats de microtests …”
Section: Méthodologie De Simulationunclassified