2009 International Symposium on Systems, Architectures, Modeling, and Simulation 2009
DOI: 10.1109/icsamos.2009.5289226
|View full text |Cite
|
Sign up to set email alerts
|

FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability

Abstract: We report on the hardware implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). It intends to support both implicit communication, via caches, and explicit communication, via directly accessible local ("scratchpad") memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks near each processor, so that part of them operates as 2nd level (local) cache, while the rest operates as scratchpad. We also strive to merge the communicatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
3
3
1

Relationship

3
4

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 25 publications
(15 reference statements)
0
14
0
Order By: Relevance
“…In order to optimize our cache-integrated design at a lower level, we carefully adapt the format of transfer descriptions via command buffers to the NoC packet header format, so that we can exploit corresponding field alignment. This approach saves about 19.3% of the logic in our integrated design, over our measurement reported in [15]. In the results presented, we assume that 1 FPGA lookup table (LUT) or 1 flip-flop is equivalent to 8 gates.…”
Section: Logic Cost Of Cache-integrated Ni Mechanismsmentioning
confidence: 83%
See 2 more Smart Citations
“…In order to optimize our cache-integrated design at a lower level, we carefully adapt the format of transfer descriptions via command buffers to the NoC packet header format, so that we can exploit corresponding field alignment. This approach saves about 19.3% of the logic in our integrated design, over our measurement reported in [15]. In the results presented, we assume that 1 FPGA lookup table (LUT) or 1 flip-flop is equivalent to 8 gates.…”
Section: Logic Cost Of Cache-integrated Ni Mechanismsmentioning
confidence: 83%
“…3 in a hardware prototype based on a Xilinx Virtex-5 FPGA. A previous version of the prototype was presented in [14]. The current version is a major rewrite of the code, carefully pursuing logic reuse, implementing event responses, three levels of NoC priority and some other features not present in the version of [14].…”
Section: The Hardware Prototypementioning
confidence: 99%
See 1 more Smart Citation
“…We present the design and implementation of an OpenMP runtime system for an FPGA prototype of the SARC architecture [9] which features explicitly managed on-chip local memories, explicit on-chip communication primitives including remote stores for producer-initiated short data transfers, RDMA operations for producer-initiated or consumer-initiated bulk data transfers, hardware event queues with automatically generated responses, and hardware counters [4].…”
Section: Introductionmentioning
confidence: 99%
“…Our work on explicit communication and synchronization for the SARC architecture includes an FPGA prototype described in [2] and a longer description of the architecture, with performance measurements collected on the FPGA prototype [13].…”
Section: Related Workmentioning
confidence: 99%