Multicore-based vector coprocessor sharing for performance and energy gains

Beldianu, Spiridon F.; Ziavras, Sotirios G.

doi:10.1145/2514641.2514644

Cited by 14 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The major differences of this paper from our earlier work [15,20] are: (a) Actual implementations on an FPGA using synthesizable VHDL (instead of higher-level SystemVerilog); (b) a larger number of benchmarked applications involving also many more scenarios; (c) the inclusion of fused multiply-add (MADD) and divide (VDIV) instructions in the vector lanes; (d) the production of power and energy consumption results followed by a relevant analysis; (e) scalability analysis for various configurations of the vector coprocessor involving 2, 4, 8, 16 and 32 lanes; (f) performance results for random scenarios involving two threads that contain vector kernels interleaved with idle times; and (g) synthesis frequency scalability analysis.…”

Section: Introductionmentioning

confidence: 65%

“…This context resembles fine-grain multithreading in superscalar processors, and increased throughput is expected because there are no data dependencies between instructions coming from different processors. More details about our VP architecture and Scheduler can be found in [20]. Table 2 shows resource consumption figures for our VP with 8 lanes and 8 memory banks configuration implemented in the Virtex XC5VLX100T FPGA device.…”

Section: Scheduling Proceduresmentioning

confidence: 99%

See 1 more Smart Citation

Versatile design of shared vector coprocessors for multicores

Beldianu

Dahlberg

Steele

et al. 2012

Microprocessors and Microsystems

Self Cite

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 65%

Section: Scheduling Proceduresmentioning

confidence: 99%

Versatile design of shared vector coprocessors for multicores

Beldianu

Dahlberg

Steele

et al. 2012

Microprocessors and Microsystems

Self Cite

View full text Add to dashboard Cite

“…''Ideal'' times are obtained by removing any MB delay in issuing instructions to the VP. ''Ideal without private memories'' times are similar to ideal but, instead of having a private memory in each lane, each lane has access to all memory banks in the vector memory using a crossbar that connects lanes to memories (similar to the architecture in [13]). Under the worst case scenario for vector load and store instructions, only one element per clock cycle can be transferred between the lanes and the vector memory.…”

Section: Comparison With Prior Workmentioning

confidence: 99%

Modular vector processor architecture targeting at data-level parallelism

Rooholamin

Ziavras

2015

Microprocessors and Microsystems

Self Cite

View full text Add to dashboard Cite

“…However, these vector-oriented designs do not address: a) the need to share resources in multicores for higher utilization while releasing silicon for the implementation of more cores or the enhancement of existing cores; b) runtime resource management of vector resources assigned to the cores since the collective needs of simultaneously running applications are normally in a fluid state; and c) runtime energy saving techniques that take into account individual application needs for vector processing [12,13].…”

Section: Introductionmentioning

confidence: 99%

“…VP sharing increases efficiency and lowers energy consumption. We present here the 40nm ASIC VP realization of a shared VP design that we first proposed in [12,13] in order to demonstrate its feasibility, and also investigate interesting design tradeoffs for embedded-system implementations. Sections II and III summarize the shared VP architecture and the ASIC design flow.…”

Section: Introductionmentioning

confidence: 99%

ASIC Design of Shared Vector Accelerators for Multicore Processors

Beldianu

Ziavras

2014

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing

Self Cite

View full text Add to dashboard Cite

--Vector coprocessor (VP) resources are often underutilized due to the lack of sustained DLP (data-level parallelism) or the presence of vector-length variations in application code. Our work is motivated by: a) the omnipresence of vector operations in high-performance scientific and embedded applications; b) the need for performance and energy efficiency; and c) applications that must often handle various vector sizes. Our design for VP sharing in multicores enhances performance while maintaining low area and energy costs. Our 40nm ASIC design yields 16.66 GFLOPs/Watt. Also, a detailed clock and power gating analysis further proves the viability of our approach.

show abstract

Multicore-based vector coprocessor sharing for performance and energy gains

Cited by 14 publications

References 15 publications

Versatile design of shared vector coprocessors for multicores

Versatile design of shared vector coprocessors for multicores

Modular vector processor architecture targeting at data-level parallelism

ASIC Design of Shared Vector Accelerators for Multicore Processors

Contact Info

Product

Resources

About