2011
DOI: 10.1007/978-3-642-19137-4_2
|View full text |Cite
|
Sign up to set email alerts
|

Scalability Evaluation of a Polymorphic Register File: A CG Case Study

Abstract: Abstract. We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 13 publications
(17 reference statements)
0
7
0
Order By: Relevance
“…A CG case study evaluated the PRF based system scalability in a heterogeneous multi-core architecture and showed CG acceleration by two orders of magnitude when using up to 256 PRF instances, with 32 vector lanes each. Moreover, a similar performance level could be achieved by fewer PRF instances than the cores needed in a Cell BE-based system, potentially saving area and power [5]. An FPGA implementation, prototyped in [6], can adjust additional PRF parameters during runtime (e.g., total storage size, number of lanes and ports), at the expense of lower clock frequency compared to the ASIC version presented here.…”
Section: Introductionmentioning
confidence: 97%
See 1 more Smart Citation
“…A CG case study evaluated the PRF based system scalability in a heterogeneous multi-core architecture and showed CG acceleration by two orders of magnitude when using up to 256 PRF instances, with 32 vector lanes each. Moreover, a similar performance level could be achieved by fewer PRF instances than the cores needed in a Cell BE-based system, potentially saving area and power [5]. An FPGA implementation, prototyped in [6], can adjust additional PRF parameters during runtime (e.g., total storage size, number of lanes and ports), at the expense of lower clock frequency compared to the ASIC version presented here.…”
Section: Introductionmentioning
confidence: 97%
“…Previous studies ( [4], [14]) have demonstrated that PRFs suit computationally intensive workloads such as Floyd, the Conjugate Gradient (CG) method and dense matrix multiplication. Moreover, PRFs could improve performance and efficiency in state of the art many-core computers, potentially saving area and power as shown in [5]. The benefits of twodimensional (2D) PRFs are: i) improved storage efficiency, as the number of registers and their dimensions / sizes are dynamically following the workload requirements; and ii) performance gain, due to the reduced number of committed instructions.…”
Section: Introductionmentioning
confidence: 99%
“…A CG case study evaluated the PRF based system scalability in a heterogeneous multi-core architecture and showed CG acceleration by two orders of magnitude using up to 256 PRF cores, with 32 vector lanes each. Moreover, a similar performance level could be achieved by fewer PRF cores compared to a Cell BE-based system, potentially saving area and power [6].…”
Section: Background and Related Workmentioning
confidence: 99%
“…Previous studies ( [5], [16]) have shown that such PRFs are suitable for computationally intensive workloads such as Floyd, the Conjugate Gradient (CG) Method and dense matrix multiplication. It was also suggested that PRFs can improve the performance efficiency in state of the art many-core computers, potentially saving area and power [6]. More specifically, the potential benefits from using a 2D PRF are: i) improved storage efficiency, as the number of registers, their dimensions and sizes are customized to the workload requirements, and ii) performance gain, as the committed instructions number is greatly reduced.…”
Section: Introductionmentioning
confidence: 99%
“…Compared to the Cell CPU, PRFs decrease the number of instructions for a customized, high performance dense matrix multiplication by up to 35X [7] and improve performance for Floyd and sparse matrix vector multiplication [8]. A Conjugate Gradient case study evaluated the scalability of up to 256 PRF-based accelerators in a heterogeneous multi-core architecture, with two orders of magnitude performance improvements [11]. Furthermore, potential power and area savings were shown by employing fewer PRF cores compared to a system with Cell processors.…”
Section: Introductionmentioning
confidence: 99%