High Performance and Energy Efficient Serial Prefetch Architecture

Reinman, Glenn; Calder, Brad; Austin, Todd

doi:10.1007/3-540-47847-7_14

Cited by 9 publications

(6 citation statements)

References 17 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To lower the energy consumption and increase thermal efficiency, Reinman et al (2002) suggested small size cache for the router memory. However, because of the limiting size of the cache, the proposed technique is not scalable.…”

Section: Achieving Thermal Efficiency In Nocmentioning

confidence: 99%

An overview of achieving energy efficiency in on-chip networks

Aziz

Khan

Loukopoulos³

et al. 2010

IJCNDS

View full text Add to dashboard Cite

Due to the increasing bandwidth demand for the network-on-chip (NoC), interconnection networks become a dominant source of energy consumption in systems-on-chip (SoCs) and chip multi processors (CMPs). Therefore, energy efficient NoC is key to a successful SoC development. This paper presents an overview of different techniques to achieve energy efficiency at the different levels of NoC design including: a component level where dynamic voltage scaling (DVS) and dynamic link shutdown (DLS) techniques are reviewed b circuit level, e.g., voltage swinging of signals c architectural level, where specialised tools, such as Wattch and Orion are discussed. We also summarise research on thermal optimisation issues. To the best of our knowledge, this is the first survey of recent research results on the area.

show abstract

Section: Achieving Thermal Efficiency In Nocmentioning

confidence: 99%

An overview of achieving energy efficiency in on-chip networks

Aziz

Khan

Loukopoulos³

et al. 2010

IJCNDS

View full text Add to dashboard Cite

show abstract

“…Using segmented word lines [Ghose and Kamble 1999] for the data portion of the instruction cache, we can fetch the necessary words while activating only the necessary sense-amplifiers, in each case. As front-end decoupling tolerates higher instruction-cache latency without loss in speculation accuracy, we can first access the tags for a set-associative instruction cache, and in subsequent cycles, access the data only in the way that hits [Reinman et al 2002]. Furthermore, we can save decoding and tag access energy in the instruction cache by merging instruction-cache accesses for sequential blocks in the BBQ that hit in the same instruction cache line.…”

Section: Front-end Architecturementioning

confidence: 99%

Block-aware instruction set architecture

Zmily

Kozyrakis

2006

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, multicycle instruction-cache accesses, and target or direction mispredictions for control-flow operations. This paper presents a block-aware instruction set (BLISS) that allows software to assist with front-end challenges. BLISS defines basic block descriptors that are stored separately from the actual instructions in a program. We show that BLISS allows for a decoupled front-end that tolerates instruction-cache latency, facilitates instruction prefetching, and leads to higher prediction accuracy.

show abstract

“…In recent work [16], we explored selectively accessing cache ways using a decoupled MC cache to create an energy efficient instruction prefetch architecture. In this submission, we expand on that research by (1) examining the use of a serial cache design just for instruction fetch, and (2) compare our serial fetch design to way predicted fetch architectures.…”

Section: Serial Fetch Architecturementioning

confidence: 99%

Using a serial cache for energy efficient instruction fetching

Reinman¹,

Calder²

2004

Journal of Systems Architecture

Self Cite

View full text Add to dashboard Cite

The design of a high performance fetch architecture can be challenging due to poor interconnect scaling and energy concerns. Way prediction has been presented as one means of scaling the fetch engine to shorter cycle times, while providing energy efficient instruction cache accesses. However, way prediction requires additional complexity to handle mispredictions.In this paper, we examine a high-bandwidth fetch architecture augmented with an instruction cache way predictor. We compare the performance and energy efficiency of this architecture to both a serial access cache and a parallel access cache. Our results show that a serial fetch architecture achieves approximately the same energy reduction and performance as way prediction architectures, without the added structures and recovery complexity needed for way prediction.The performance of any architecture is limited by the amount of instruction fetch bandwidth that can be supplied to the execution core. Instruction cache performance is a vital part of achieving high fetch bandwidth. An energy efficient fetch design that still achieves high performance is also important because overall chip energy consumption may limit not only what can be integrated onto a chip, but also how fast the chip can be clocked [7]. Brooks et al. [1] report that instruction fetch and the branch target buffer are responsible for 22.2% and 4.7% respectively of power consumed by the Intel Pentium Pro.Brooks also reports that caches comprise 16.1% of the power consumed by the Alpha 21264. Montanaro et al. [6] found that the instruction cache consumes 27% of power in their StrongARM 110 processor.Set-associative cache designs can improve performance over a direct mapped cache by reducing thrashing among cache blocks that map to the same cache index (i.e. among all ways within a cache set). This extra associativity comes at the price of increased energy. During a parallel cache access, both the tag and data components of all cache ways (blocks) in a given cache set (index) must be driven. If the tag component of one of the ways matches the desired address, then the corresponding data component of that way is selected to be output. But regardless of which way matches the desired address, all ways in the set are driven on the bitlines of the cache to the logic that selects a single cache block to output.Way prediction [4,13,9] has been proposed as a means to provide low-latency, energy efficient cache access. Way prediction has been used in a number of real world architectures, including the Alpha 21264 [10], which makes use of the Next Line and Set (NLS) [3] predictor, a branch predictor with integrated way prediction. However, way prediction requires additional hardware to perform the actual way prediction, verify the correctness of a prediction, and recover in the event of a misprediction.In this paper, we compare the performance of using way prediction [4,13,9,10,3] to using a serial Decoder Data Array Data Output Col mux & sense amps Way 1 Way 0 Way 0 Decoder Data Array Way 1 Tag Array 16...

show abstract

High Performance and Energy Efficient Serial Prefetch Architecture

Cited by 9 publications

References 17 publications

An overview of achieving energy efficiency in on-chip networks

An overview of achieving energy efficiency in on-chip networks

Block-aware instruction set architecture

Using a serial cache for energy efficient instruction fetching

Contact Info

Product

Resources

About