37th International Symposium on Microarchitecture (MICRO-37'04)
DOI: 10.1109/micro.2004.9
|View full text |Cite
|
Sign up to set email alerts
|

Cache Refill/Access Decoupling for Vector Machines

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
7
0

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 26 publications
1
7
0
Order By: Relevance
“…Our results in Table 8 concur with the insight into realizable memory bandwidth by Batten et al [6]. That is, it is the control and buffering overhead in the processor (reorder buffer entries, physical registers, ld-st queue entries, outstanding cache miss trackers and buffering cost in caches and in the interconnect, etc.…”
Section: Stream and Saxpysupporting
confidence: 88%
“…Our results in Table 8 concur with the insight into realizable memory bandwidth by Batten et al [6]. That is, it is the control and buffering overhead in the processor (reorder buffer entries, physical registers, ld-st queue entries, outstanding cache miss trackers and buffering cost in caches and in the interconnect, etc.…”
Section: Stream and Saxpysupporting
confidence: 88%
“…Batten et al have noted that not only the access latency of memory sub-systems but also their bandwidth is very important to improve the application performance [8]. They have proposed an inexpensive non-blocking cache memory for vector architectures to improve the bandwidth and reduce the access latency of memory sub-systems.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, the vector architecture can potentially achieve high computing performance for MMAs. Modern vector architectures usually employ a multibanked cache memory in order to improve their data transfer performance [5]- [8]. The memory subsystem with multibanked cache memory can provide data to parallelized functional units at a sufficient transfer rate.…”
Section: Introductionmentioning
confidence: 99%
“…A major job of the compiler is to manage the utilization of the SRF. By contrast, this is much less of a concern for the Scale compiler due to Scale's cached shared memory model and decoupled cache refills [7]. Additionally, the stream processing compiler performs a binary search to determine the best strip-size when strip mining, while this is not an issue for Scale.…”
Section: Related Workmentioning
confidence: 99%