Proceedings of the 2016 International Conference on Supercomputing 2016
DOI: 10.1145/2925426.2926258
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Private Local Memories to Reduce the Opportunity Cost of Accelerator Integration

Abstract: We present Roca, a technique to reduce the opportunity cost of integrating non-programmable, high-throughput accelerators in general-purpose architectures. Roca exploits the insight that non-programmable accelerators are mostly made of private local memories (PLMs), which are key to the accelerators' performance and energy efficiency. Roca transparently exposes PLMs of otherwise unused accelerators to the cache substrate, thereby allowing the system to extract utility from accelerators even when they cannot di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…While this optimization is efficient, it limits the reusability of the banks across different applications as it constrains the accesses to the same physical banks. Accelerator memories can also be "borrowed" by other components such as to expand the available cache space [22]. In any of these cases, memory resource utilization can be reduced or the same amount of physical memory can be used more effectively.…”
Section: B Resource Optimizationmentioning
confidence: 99%
See 1 more Smart Citation
“…While this optimization is efficient, it limits the reusability of the banks across different applications as it constrains the accesses to the same physical banks. Accelerator memories can also be "borrowed" by other components such as to expand the available cache space [22]. In any of these cases, memory resource utilization can be reduced or the same amount of physical memory can be used more effectively.…”
Section: B Resource Optimizationmentioning
confidence: 99%
“…This work was shown to achieve an average of 2.28× performance speedup and 15% energy consumption reduction in HLS-generated accelerators. ROCA [22] is a technique which exposes PLMs of accelerators to the LLC while the accelerator is not in use. ROCA implements the necessary overhead to enable this, including an enlarged tag array in LLC to track cache blocks stored in PLMs, logic to allow accelerators to reclaim their PLMs, logic to disable cache access to the PLM based on accelerators' activity rate, logic to coalesce memories of various sizes and expose them to the LLC as one PLM, and logic to flush dirty cache blocks in PLMs.…”
Section: Taxonomy Of Existing Projectsmentioning
confidence: 99%
“…Typically, the PLM features aggressive SRAM banking that provides multi-ported memory accesses to match the multiple parallel blocks of the accelerator datapath. Since a large portion of the accelerator area consists of the PLM banks, the opportunity cost of investing die real estate on specialized accelerators can be efficiently mitigated by reusing it as a non-uniform cache architecture (NUCA) substrate [10,12]. Being "out of core" and encapsulated in its own tile, an accelerator is simply accessed through the SCCI.…”
Section: A Scalable Architecturementioning
confidence: 99%
“…In fact, even if the PLM could reach 90% of the chip area, the amount of data that can be stored on-chip is usually limited to few MBs [21]. Efficient methods have been proposed to reduce the footprint of the on-chip memory by exploiting sharing techniques [8,21,26], to perform data prefetch and reduce the latency in accessing the external memory [33], and to improve utilization of the silicon area dedicated to memory [11,9,14]. However, before this paper, there has been no comprehensive analysis of the effects of multiple accelerators processing concurrently large amounts of data accessed through off-chip memory.…”
Section: Related Workmentioning
confidence: 99%