Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

Jayapala, Murali; Barat, Francisco; Aa, Tom Vander; Catthoor, Francky; Corporaal, Henk; Deconinck, Geert

doi:10.1109/tc.2005.92

Cited by 42 publications

(32 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Approaches have included instruction compression and scheduling to minimize the instruction bus activity factor [5]. The indexed IRF architecture effectively implements compressed instruction loads, but could potentially benefit from wire activity-aware scheduling.…”

Section: Related Workmentioning

confidence: 99%

“…These two approaches have been shown to reduce instruction fetch power by 58% and 60% relative to a standard I-cache, respectively [6], [2]. Clustered loop buffers [5] were proposed to build an efficient and scalable loop buffer for an 8-wide VLIW processor, achieving 63% energy savings. Their implementation is similar to the Distributed Indexed configuration described here, but their distributed control resulted in worse efficiency, possibly due to disregarding the wire energy.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Hierarchical Instruction Register Organization

Black-Schaffer

Balfour

Dally

et al. 2008

IEEE Comput. Arch. Lett.

View full text Add to dashboard Cite

Abstract-This paper analyzes a range of architectures for efficient delivery of VLIW instructions for embedded media kernels. The analysis takes an efficient Filter Cache as a baseline and examines the benefits from 1) removing the tag overhead, 2) distributing the storage, 3) adding indirection, 4) adding efficient NOP generation, and 5) sharing instruction memory. The result is a hierarchical instruction register organization that provides a 56% energy and 40% area savings over an already efficient Filter Cache.Index Terms-energy-efficient embedded processor architecture, hierarchical and distributed instruction register organization, VLIW instruction delivery

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Hierarchical Instruction Register Organization

Black-Schaffer

Balfour

Dally

et al. 2008

IEEE Comput. Arch. Lett.

View full text Add to dashboard Cite

show abstract

“…The loop buffer is used when the processor is running a loopdominated code and the instructions are fetched from the L1 Instruction Memory. For further details on the loop buffer and its operation, the reader is referred to [42]. Both types of caches (data and instruction) are connected to a unified level 2 cache, which is in turn connected to an external memory.…”

Section: Initial Vliw Architecturementioning

confidence: 99%

“…1 shows two data clusters, where each cluster contains three functional units. Next, an instruction cluster is formed by using a distributed instruction buffer (also called loop buffer [42,43]) across multiple-issue slots of the VLIW processor. Fig.…”

Section: Article In Pressmentioning

confidence: 99%

Joint hardware–software leakage minimization approach for the register file of VLIW embedded architectures

Atienza

Raghavan

Ayala³

et al. 2008

Integration

View full text Add to dashboard Cite

New applications demand very high processing power when run on embedded systems. Very Long Instruction Word (VLIW) architectures have emerged as a promising alternative to provide such processing capabilities under the given energy budget. However, in this new VLIW-based architectures, the register file is a very critical contributor to the overall power consumption and new approaches have to be proposed to reduce its power while preserving system performance. In this paper, we propose a novel joint hardware-software approach that reduces the leakage energy in the register files of these embedded VLIW architectures. This approach relies upon an energy-aware register assignment method and a hardware support that creates sub-banks in the global register file that can be switched on/off at run time. Our results indicate energy savings in the register file, after considering the overhead of the added extra hardware, up to 50% for modern multimedia embedded applications without performance degradation. We illustrate this approach using real-life applications running on these processors. We also illustrate the tradeoff between the area overhead vs. the gains in the leakage energy for the different strategies. r

show abstract

“…For example, instruction cache contributes to 27% of the total energy consumed in the StrongARM processor [30]. To reduce instruction delivery energy (which in this paper refers to the energy consumed in instruction storage hierarchy), researchers have proposed extending the hierarchy by adding small instruction stores (typically 1KB or smaller) between the L1 instruction cache and the processor [25,28,29,14,24, * This work was done when the author was at Stanford University.…”

Section: Introductionmentioning

confidence: 99%

Fine-grain dynamic instruction placement for L0 scratch-pad memory

Park

Balfour

Dally

2010

Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

View full text Add to dashboard Cite

We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of spms. Evaluation of L0 spms with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 spms with an ideal coarse-grain algorithm, respectively.

show abstract

Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

Cited by 42 publications

References 37 publications

Hierarchical Instruction Register Organization

Hierarchical Instruction Register Organization

Joint hardware–software leakage minimization approach for the register file of VLIW embedded architectures

Fine-grain dynamic instruction placement for L0 scratch-pad memory

Contact Info

Product

Resources

About