2006
DOI: 10.1109/tc.2006.4
|View full text |Cite
|
Sign up to set email alerts
|

Beating in-order stalls with "flea-flicker" two-pass pipelining

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2006
2006
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(25 citation statements)
references
References 27 publications
0
25
0
Order By: Relevance
“…If we reduce EDA to a purely performance enhancing mechanism, it resembles a class of techniques represented by decoupled access/execute architecture [6], Slip-stream [7], [20], dual-core execution (DCE) [9], Fleaflicker [21], Tandem [18], and Paceline [19]. Among these, we compare to DCE as architecturally, it is perhaps the most closely related design: both try to avoid long-latency cache miss-induced stalls to improve performance.…”
Section: 01mentioning
confidence: 99%
See 1 more Smart Citation
“…If we reduce EDA to a purely performance enhancing mechanism, it resembles a class of techniques represented by decoupled access/execute architecture [6], Slip-stream [7], [20], dual-core execution (DCE) [9], Fleaflicker [21], Tandem [18], and Paceline [19]. Among these, we compare to DCE as architecturally, it is perhaps the most closely related design: both try to avoid long-latency cache miss-induced stalls to improve performance.…”
Section: 01mentioning
confidence: 99%
“…In Section 5.3, we contrasted our approach with a class of designs using two passes to process a thread, including Slip-stream [7], [20], dual-core execution [9], Flea-flicker [21], Tandem [18], and Paceline [19]. Another class of related work is helper-threading (also called speculative precomputation) (e.g., [24]- [32]).…”
Section: Related Workmentioning
confidence: 99%
“…The leader core runs a shorter version based on the removal of ineffectual instructions while the checker core runs the unmodified program. Lastly, Flea-Flicker two pass pipelining [4] allows the leader core to return an invalid value on long-latency operations and proceed. In most of these schemes, the checker core takes advantage of program execution on the leader core by receiving preprocessed instruction streams, resolved branches, and L2 cache prefetches.…”
Section: Challenges In Coupling With a Faulty Corementioning
confidence: 99%
“…As the memory wall problem has come to overshadow other aspects of processing, various forms of runahead execution have been proposed [21][12] [7][3] [4]. Runahead execution attempts to reduce the effect of the long memory latencies by increasing the memory-level parallelism.…”
Section: Introductionmentioning
confidence: 99%