2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) 2013
DOI: 10.1109/codes-isss.2013.6658992
|View full text |Cite
|
Sign up to set email alerts
|

Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters

Abstract: Several many-core designs tackle scalability issues by leveraging tightly-coupled clusters as building blocks, where lowlatency, high-bandwidth interconnection between a small/medium number of cores and L1 memory achieves high performance/watt. Tight coupling of hardware accelerators into these multicore clusters constitutes a promising approach to further improve performance/area/watt. However, accelerators are often clocked at a lower frequency than processor clusters for energy efficiency reasons. In this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 21 publications
0
9
0
Order By: Relevance
“…While this strategy can result in orders-of-magnitude power reductions, it is also inflexible, as each block can perform a single function. These characteristics are even more pre-eminent when accelerators are shared by multiple cores [36], [37], because requests for accelerated functions must be arbitrated.…”
Section: State-of-the-artmentioning
confidence: 99%
“…While this strategy can result in orders-of-magnitude power reductions, it is also inflexible, as each block can perform a single function. These characteristics are even more pre-eminent when accelerators are shared by multiple cores [36], [37], because requests for accelerated functions must be arbitrated.…”
Section: State-of-the-artmentioning
confidence: 99%
“…Cong et al [10] also tackle the utilization wall by developing a heterogeneous multi-core architecture with shared-memory accelerators; their HW IPs communicate by means of shared L2 caches, accessible through NoC nodes. Previous work by our group (Burgio et al [8], Dehyadegari et al [16,17], Conti et al [11]) considers a tightly-coupled multi-core based on RISC32 cores sharing a L1 scratchpad and extend it with hardware processing units (HWPUs). HWPUs are managed by the software through an OpenMPbased programming model designed to mix parallelization and acceleration.…”
Section: Related Workmentioning
confidence: 99%
“…Figure 1 shows a diagram of the He-P2012 cluster extended with HWPEs for heterogeneous computing. As detailed in our previous work [11], HWPEs are designed as two separate modules:…”
Section: He-p2012: Heterogeneous P2012mentioning
confidence: 99%
“…Dehyadegari [12] and Conti [13] exploit shared-memory as a communication medium between cores and accelerators. Our current and previous work [14], [15] assumes the same architecture, tackling also programmability and scalability issues.…”
Section: Related Workmentioning
confidence: 99%