2001
DOI: 10.1109/12.956093
|View full text |Cite
|
Sign up to set email alerts
|

Improving latency tolerance of multithreading through decoupling

Abstract: AbstractÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradigms: simultaneous multithreading and access/execute decoupling. Since its decoupled units issue instructions i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2005
2005
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 40 publications
0
3
0
Order By: Relevance
“…Overall, they show that decoupled access/execute is an effective energy saving technique in many-core architectures as it requires simple hardware and, in addition, the number of thread contexts required to keep the functional units busy can be significantly reduced. It is worth noting that similar conclusions about the synergy of decoupling and multithreading were already suggested in [122].…”
Section: Decoupled Access/executesupporting
confidence: 74%
See 1 more Smart Citation
“…Overall, they show that decoupled access/execute is an effective energy saving technique in many-core architectures as it requires simple hardware and, in addition, the number of thread contexts required to keep the functional units busy can be significantly reduced. It is worth noting that similar conclusions about the synergy of decoupling and multithreading were already suggested in [122].…”
Section: Decoupled Access/executesupporting
confidence: 74%
“…This small amount of threads allows the architecture to hide the latency of the functional units and keep them busy, which is an issue that the access/execute decoupling mechanism does not address. It is worth noting that similar conclusions about the synergy of decoupling and multithreading were already suggested in [122]. Figure 3.14 shows the normalized energy obtained by the different memory latency tolerance schemes.…”
Section: Multithreading Prefetching and Decoupled Access/executesupporting
confidence: 68%
“…IBM's POWER6 architecture implements multithreading and a restricted form of hardware scout, called load-lookahead prefetching [12]. Other work proposes dynamic instruction partitioning into separate threads and leverages out-oforder hardware in order to provide memory latency tolerance [19]. Additionally, many designs have implemented hardware prefetching.…”
Section: Related Workmentioning
confidence: 98%
“…Other contemporary decoupling work involves hardware partitioning and SMT [19]. In this work, the authors propose hardware partitioning of integer and floating-point instructions into separate threads in order to provide memory latency tolerance using large instruction queues to hold dependent floating-point instructions while they wait for the miss to return.…”
Section: Decoupled Techniquesmentioning
confidence: 99%