2020 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2020
DOI: 10.1109/hpca47549.2020.00042
|View full text |Cite
|
Sign up to set email alerts
|

Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors

Abstract: Flexible instruction scheduling is essential for performance in out-of-order processors. This is typically achieved by using CAM-based Instruction Queues (IQs) that provide complete flexibility in choosing ready instructions for execution, but at the cost of significant scheduling energy.In this work we seek to reduce the instruction scheduling energy by reducing the depth and width of the IQ. We do so by classifying instructions based on their readiness and criticality, and using this information to bypass th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
14
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(14 citation statements)
references
References 37 publications
0
14
0
Order By: Relevance
“…Long-term parking [33] saves power in an OoO core by allocating back-end resources for critical instructions while buffering non-critical instructions in the frontend. More recently, Alipour et al [2] leverage instruction criticality and readiness to bypass the outof-order back-end. Instructions that do not benefit from out-of-order scheduling and instructions that do not suffer from being delayed are sent to an in-order FIFO queue.…”
Section: Related Workmentioning
confidence: 99%
“…Long-term parking [33] saves power in an OoO core by allocating back-end resources for critical instructions while buffering non-critical instructions in the frontend. More recently, Alipour et al [2] leverage instruction criticality and readiness to bypass the outof-order back-end. Instructions that do not benefit from out-of-order scheduling and instructions that do not suffer from being delayed are sent to an in-order FIFO queue.…”
Section: Related Workmentioning
confidence: 99%
“…Long-term parking [16] saves power in an OoO core by allocating back-end resources for critical instructions while buffering non-critical instructions in the front-end. More recently, Alipour et al [1] leverage instruction criticality and readiness to bypass the out-of-order back-end. Instructions that do not benefit from out-of-order scheduling and instructions that do not suffer from being delayed are sent to an in-order FIFO queue.…”
Section: Related Workmentioning
confidence: 99%
“…The sOoO cores are restricted out-of-order machines that add modest hardware overhead upon a stall-on-use in-order core to improve instruction-level parallelism (ILP) as well as memory-hierarchy parallelism (MHP). 1 Load Slice Core (LSC) [5] was the first work to propose an sOoO core; Freeway [10] builds upon the LSC proposal and exposes more MHP than LSC by adding one more in-order queue for uncovering additional independent loads. LSC and Freeway identify the address-generating instructions (AGIs) of loads and stores in an iterative manner using a hardware mechanism called Iterative Backward Dependence Analysis (IBDA).…”
mentioning
confidence: 99%
“…Latency-tolerating techniques such as out-of-order (OOO) execution [116] continue to execute independent instructions behind long-latency loads in the sequential instruction stream, avoiding pipeline stalls. The opportunities for reordering instructions have been further extended by systems that consider instruction criticality [3,20,66,81,90,91,102] and increase the number of instructions that can be reordered with delinquent loads.…”
mentioning
confidence: 99%
“…Continuous runahead engines [48] also introduce additional compute cores for executing (redundant) instructions to generate prefetches. OOO processors have exploited criticality mainly for improving energy efficiency [3,20,66,81,90,91,102] and for optimizing the cache hierarchy [10,87,115] but not for prefetching. These techniques also introduce complex hardware and storage requirements for classifying critical instructions at runtime.…”
mentioning
confidence: 99%