Performance evaluation of vector accesses in parallel memories using a skewed storage scheme

Harper, D. T.; Jump, J. Robert

doi:10.1145/17356.17394

Cited by 19 publications

(17 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other papers focus on the memory interleaving schemes on vector systems [3,15,17,18,21,25]. Authors in [9], [3], and [17] study the skew schemes. Rau, Schlansker, and Yen propose a pseudo-random interleaving technique using the XOR function to randomize the mapping of references to memory modules in [15].…”

Section: Other Related Workmentioning

confidence: 99%

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Zhang

Zhu²,

Zhang³

2000

Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture

202

150

View full text Add to dashboard Cite

DRAM row-buffer conflicts occur when a sequence of requests on different rows goes to the same memory bank, causing much higher memory access latency than requests to the same row or to different banks. In this paper, we analyze the sources of row-buffer conflicts in the context of superscalar processors, and propose a permutation-based page interleaving scheme to reduce row-buffer conflicts and to exploit data access locality in the row-buffer. Compared with several existing schemes, we show that the permutation-based scheme dramatically increases the hit rates on DRAM row-buffers and reduces memory stall time of the SPEC95 and TPC-C workloads. The memory stall times of the workloads are reduced up to 68% and 50%, compared with the conventional cache line and page interleaving schemes, respectively.

show abstract

Section: Other Related Workmentioning

confidence: 99%

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

Zhang

Zhu²,

Zhang³

2000

Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture

202

150

View full text Add to dashboard Cite

show abstract

“…There are many proposals that exploit the bank organization of DRAM memory [40], [46], [27]. This is especially true in the vector processor domain.…”

Section: Related Workmentioning

confidence: 99%

A DRAM/SRAM memory scheme for fast packet buffers

García-Vidal

March

Cerdà

et al. 2006

IEEE Trans. Comput.

View full text Add to dashboard Cite

Abstract-We address the design of high-speed packet buffers for Internet routers. We use a general DRAM/SRAM architecture for which previous proposals can be seen as particular cases. For this architecture, large SRAMs are needed to sustain high line rates and a large number of interfaces. A novel algorithm for DRAM bank allocation is presented that reduces the SRAM size requirements of previously proposed schemes by almost an order of magnitude, without having memory fragmentation problems. A technological evaluation shows that our design can support thousands of queues for line rates up to 160 Gbps.

show abstract

“…When accessing streams in a matched-memory system, skewing and linear transformations also lead to conflict-free access to a single family of strides. The difference is that the average degradation when the access is done with a stride that is not conflict free can be reducecl by the use of buffers [4].…”

Section: Introductionmentioning

confidence: 99%

Synchronized access to streams in SIMD vector multiprocessors

Peiron

Valero

Ayguadé

1994

Proceedings of the 8th International Conference on Supercomputing - ICS '94

View full text Add to dashboard Cite

The synchronized and simultaneous access to several vectors that form a single stream is typical in SIMD vector multiprocessors as well as in MIMD superscalar multiprocessors with decoupled access. In this paper we propose a block-interleaved storage scheme and an out-of-order access mechanism that allows conflict-free access to streams with an arbitrmy initial address and constant stride between elements. The memory system can have any degree of unrnatchness and we consider the use of either a crossbar or a multistage interconnection network. A maximal number of conflict-free families including the most commonly used strides can be obtained. We describe the hardware for address calculation and control and show that their additional costs are minimal compared with the cost of the hardware for in-order access. Finally, we evaluate the applicability of this technique to real loops from some programs of the Perfect Club and SPEC suites.

show abstract

Performance evaluation of vector accesses in parallel memories using a skewed storage scheme

Cited by 19 publications

References 6 publications

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality

A DRAM/SRAM memory scheme for fast packet buffers

Synchronized access to streams in SIMD vector multiprocessors

Contact Info

Product

Resources

About