VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Chouliaras, V.A.; Stevens, David; Dwyer, Vincent M.

doi:10.1016/j.micpro.2016.07.010

Cited by 3 publications

(3 citation statements)

References 36 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is further compounded by targeting FPGAs which are less forgiving on mux-heavy designs such as a VLIW CMP. This key observation is elaborated in the companion paper [17] and will be mitigated in the next generation of the silicon using a further HDL parameter. as a ratio of the FPGA slice utilisation; with the single context configuration requiring 5% of the slices, whereas the 8C-8B requires 37.6%.…”

Section: A Ilpmentioning

confidence: 99%

“…This is achieved through our core contributions which include the instantiation on the SoC-FPGA of the LE1 CMP and the subsequent on-line compilation of OpenCL kernels by our framework targeting the LE1 silicon. We also note that the LE1 is a capable MIMD accelerator, can easily accommodate shared-memory programming models such as OpenMP and POSIX Threads (PThreads) [17] and due to the proposed source transformation/compilation flow (Section IV), it does not suffer software-incurred performance inefficiencies due to thread divergence. The authors are unaware of any current heterogeneous systems that use fully configurable general-purpose, manycore, VLIW microprocessors as OpenCL accelerators on SoC-FPGAs.…”

Section: A Motivationmentioning

confidence: 99%

“…We have chosen to the demonstrate the performance of the LE1 with no decoupling as this is the worst-possible case. A detailed description of an improved LE1 micro-architecture which fully alleviates this issue has been submitted [17] and is briefly discussed in Section VII. We also note that simple re-ordering of the produced binary by our back-end can eliminate practically all these stalls on the existing LE1 however this hasn't been included in the current OpenCL framework.…”

Section: A Ilpmentioning

confidence: 99%

See 2 more Smart Citations

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Parker

Chouliaras

2016

Journal of Systems Architecture

View full text Add to dashboard Cite

Section: A Ilpmentioning

confidence: 99%

Section: A Motivationmentioning

confidence: 99%

Section: A Ilpmentioning

confidence: 99%

See 1 more Smart Citation

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Parker

Chouliaras

2016

Journal of Systems Architecture

View full text Add to dashboard Cite

Evaluation of a low overhead predication system for a deterministic VLIW architecture targeting real-time applications

Starke

Carminati

2017

Microprocessors and Microsystems

View full text Add to dashboard Cite

Co-Design of Multicore Hardware and Multithreaded Software for Thread Performance Assessment on an FPGA

Adam

2022

Computers

View full text Add to dashboard Cite

Multicore and multithreaded architectures increase the performance of computing systems. The increase in cores and threads, however, raises further issues in the efficiency achieved in terms of speedup and parallelization, particularly for the real-time requirements of Internet of things (IoT)-embedded applications. This research investigates the efficiency of a 32-core field-programmable gate array (FPGA) architecture, with memory management unit (MMU) and real-time operating system (OS) support, to exploit the thread level parallelism (TLP) of tasks running in parallel as threads on multiple cores. The research outcomes confirm the feasibility of the proposed approach in the efficient execution of recursive sorting algorithms, as well as their evaluation in terms of speedup and parallelization. The results reveal that parallel implementation of the prevalent merge sort and quicksort algorithms on this platform is more efficient. The increase in the speedup is proportional to the core scaling, reaching a maximum of 53% for the configuration with the highest number of cores and threads. However, the maximum magnitude of the parallelization (66%) was found to be bounded to a low number of two cores and four threads. A further increase in the number of cores and threads did not add to the improvement of the parallelism.

show abstract

VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

Cited by 3 publications

References 36 publications

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Evaluation of a low overhead predication system for a deterministic VLIW architecture targeting real-time applications

Co-Design of Multicore Hardware and Multithreaded Software for Thread Performance Assessment on an FPGA

Contact Info

Product

Resources

About