2005
DOI: 10.1109/tc.2005.92
|View full text |Cite
|
Sign up to set email alerts
|

Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors

Abstract: DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2006
2006
2013
2013

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(32 citation statements)
references
References 37 publications
0
32
0
Order By: Relevance
“…Approaches have included instruction compression and scheduling to minimize the instruction bus activity factor [5]. The indexed IRF architecture effectively implements compressed instruction loads, but could potentially benefit from wire activity-aware scheduling.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Approaches have included instruction compression and scheduling to minimize the instruction bus activity factor [5]. The indexed IRF architecture effectively implements compressed instruction loads, but could potentially benefit from wire activity-aware scheduling.…”
Section: Related Workmentioning
confidence: 99%
“…These two approaches have been shown to reduce instruction fetch power by 58% and 60% relative to a standard I-cache, respectively [6], [2]. Clustered loop buffers [5] were proposed to build an efficient and scalable loop buffer for an 8-wide VLIW processor, achieving 63% energy savings. Their implementation is similar to the Distributed Indexed configuration described here, but their distributed control resulted in worse efficiency, possibly due to disregarding the wire energy.…”
Section: Related Workmentioning
confidence: 99%
“…The loop buffer is used when the processor is running a loopdominated code and the instructions are fetched from the L1 Instruction Memory. For further details on the loop buffer and its operation, the reader is referred to [42]. Both types of caches (data and instruction) are connected to a unified level 2 cache, which is in turn connected to an external memory.…”
Section: Initial Vliw Architecturementioning
confidence: 99%
“…1 shows two data clusters, where each cluster contains three functional units. Next, an instruction cluster is formed by using a distributed instruction buffer (also called loop buffer [42,43]) across multiple-issue slots of the VLIW processor. Fig.…”
Section: Article In Pressmentioning
confidence: 99%
“…For example, instruction cache contributes to 27% of the total energy consumed in the StrongARM processor [30]. To reduce instruction delivery energy (which in this paper refers to the energy consumed in instruction storage hierarchy), researchers have proposed extending the hierarchy by adding small instruction stores (typically 1KB or smaller) between the L1 instruction cache and the processor [25,28,29,14,24, * This work was done when the author was at Stanford University.…”
Section: Introductionmentioning
confidence: 99%