Decomposing the load-store queue by function for power reduction and scalability

Baugh, Lee W.; Zilles, Craig

doi:10.1147/rd.502.0287

Cited by 30 publications

(33 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Proposals can be grouped into three general classes. The first class maintains the age-ordered store queue structure but uses partitioning, filtering, hierarchy, dependence speculation, and speculative forwarding through the primary data cache or other structures to reduce the frequency of associative store queue search or the number of entries examined per search [2,5,12,18,20]. A second class avoids associative search by abandoning the conventional age-ordered structure and replacing it with a cache-like address-indexed structure [6,18,21,24].…”

Section: Related Workmentioning

confidence: 99%

“…Associative search constrains the scalability of the store queue, which in turn constrains the scalability of the entire instruction window. To address this challenge, recent work has proposed to reduce both search frequency and the number of entries that must be searched [2,5,12,15,18,20], to replace the fully-associative age-indexed store queue with a set-associative address-indexed forwarding structure [6,21,24], or to maintain the age-ordered structure but replace associative search with speculative indexed access [19,22]. This paper presents NoSQ (short for No Store Queue and pronounced like "mosque"), a microarchitecture that implements in-flight store-load communication without a store queue or any other intermediary structure.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NoSQ: Store-Load Communication without a Store Queue

2007

View full text Add to dashboard Cite

This paper presents NoSQ (short for No Store Queue), a microarchitecture that performs store-load communication without a store queue and without executing stores in the out-of-order engine. NoSQ implements store-load communication using speculative memory bypassing (SMB), the dynamic shortcircuiting of DEF-store-load-USE chains to DEF-USE chains. Whereas previous proposals used SMB as an opportunistic complement to conventional store queue-based forwarding, NoSQ uses SMB as a store queue replacement.NoSQ relies on two supporting mechanisms. The first is an advanced store-load bypassing predictor that for a given dynamic load can predict whether that load will bypass and the identity of the communicating store. The second is an efficient verification mechanism for both bypassed and non-bypassed loads using in-order load re-execution with an SMB-aware store vulnerability window (SVW) filter.The primary benefit of NoSQ is a simple, fast datapath that does not contain store-load forwarding hardware; all loads get their values either from the data cache or from the register file. Experiments show that this simpler design -despite being more speculative -slightly outperforms a conventional store-queue based design on most benchmarks (by 2% on average). Microarchitecture, 2006, MICRO-39, pages 285-296. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Comments Copyright 2006 IEEE. Reprinted from Proceedings of the 39th Annual IEEE/ACM International Symposium on

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

NoSQ: Store-Load Communication without a Store Queue

2007

View full text Add to dashboard Cite

show abstract

“…This L1 structure is backed up by a much larger second-level (L2) structure to correct/complement the work of the L1 structure. The L1 structure can be allocated according to program order or execution order (within a bank, if banked) for every store [1,8,24] or only allocated to those stores predicted to be involved in forwarding [3,17]. The L2 structure is also used in varying ways due to different focuses.…”

Section: Highlight Of Optimized and Alternative Designsmentioning

confidence: 99%

“…The L2 structure is also used in varying ways due to different focuses. It can be banked to save energy per access [3,17]; it can be filtered to reduce access frequency (and thus energy) [1,19]; or it can be simplified in functionality such as removing the forwarding capability [24].…”

Section: Highlight Of Optimized and Alternative Designsmentioning

confidence: 99%

“…Besides the difference in focus, the mechanisms we use in the speculation and the recovery from mis-speculation also differ from this prior work. In a two-level disambiguation approach [1,3,8,17,24], the fundamental action is still that of an exact disambiguation: comparing addresses and figuring out age relationship to determine the right producer store to forward from. The speculation is on the scope of the disambiguation: only a subset of the stores are inspected.…”

Section: Differences Between Smde and Other Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

Garg¹,

Rashid²,

Huang³

33rd International Symposium on Computer Architecture (ISCA'06)

View full text Add to dashboard Cite

show abstract

A Power-Efficient and Scalable Load-Store Queue Design

Castro

Chaver

Piñuel

et al. 2005

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The load-store queue (LQ-SQ) of modern superscalar processors is responsible for keeping the order of memory operations. As the performance gap between processing speed and memory access becomes worse, the capacity requirements for the LQ-SQ increase, and its design becomes a challenge due to its CAM structure. In this paper we propose an efficient load-store queue state filtering mechanism that provides a significant energy reduction (on average 35% in the LSQ and 3.5% in the whole processor), and only incurs a negligible performance loss of less than 0.6%.

show abstract

Decomposing the load-store queue by function for power reduction and scalability

Cited by 30 publications

References 27 publications

NoSQ: Store-Load Communication without a Store Queue

NoSQ: Store-Load Communication without a Store Queue

Slackened Memory Dependence Enforcement: Combining Opportunistic Forwarding with Decoupled Verification

A Power-Efficient and Scalable Load-Store Queue Design

Contact Info

Product

Resources

About