Proceedings of the 22nd Annual International Symposium on Computer Architecture - ISCA '95 1995
DOI: 10.1145/223982.224444
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of instruction fetch mechanisms for high issue rates

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

1997
1997
2009
2009

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 96 publications
(1 citation statement)
references
References 13 publications
0
0
0
Order By: Relevance
“…The alignment network unit can be designed using a two-bank interleaved cache, so that two consecutive cache lines can be accessed simultaneously, and therefore a whole stride-one vector access, overlapped over two different cache lines, can be performed. This scheme requires three building blocks: an interchange switch, since it may be needed to swap the two cache lines, a shifter to align the lines accessed to the initial address, and a logic to mask the unused data based on the unalignment offset [54] (see figure 7.9). Using this scheme, the unaligned load can be performed in one cycle and the store requires an additional cycle because it first needs to shift and mask the data from the vector register and then to swap the partition for the two cache banks [202,190].…”
Section: Adding Support For Unaligned Loads and Storesmentioning
confidence: 99%
“…The alignment network unit can be designed using a two-bank interleaved cache, so that two consecutive cache lines can be accessed simultaneously, and therefore a whole stride-one vector access, overlapped over two different cache lines, can be performed. This scheme requires three building blocks: an interchange switch, since it may be needed to swap the two cache lines, a shifter to align the lines accessed to the initial address, and a logic to mask the unused data based on the unalignment offset [54] (see figure 7.9). Using this scheme, the unaligned load can be performed in one cycle and the store requires an additional cycle because it first needs to shift and mask the data from the vector register and then to swap the partition for the two cache banks [202,190].…”
Section: Adding Support For Unaligned Loads and Storesmentioning
confidence: 99%