Heterogeneous Computing System for Deep Learning

Maliţa, Mihaela; Popescu, George Vladut; Gheorghe, Ștefan

doi:10.1007/978-3-030-31756-0_10

Cited by 2 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a comparison, in [28], the FPGAs were reported to provide up to 28.7 higher performance per watt than the corresponding GPU implementations for several LSTM models. Moreover, as stated in [29], the actual achieved performance in a GPU, for LSTM networks, ranges from 17.85% to 25% of the peak reported performance of the GPU; thus, for an NVIDIA Tesla K80 GPU, this translates from 3.33 to 4.66 GFLOPS/W of achieved performance. As a result, the created system, which triggers 5.29 GFLOPS/W, can outperform such a GPU.…”

Section: Performance Over Power Efficiencymentioning

confidence: 96%

A Novel FPGA-Based Intent Recognition System Utilizing Deep Recurrent Neural Networks

2021

View full text Add to dashboard Cite

In recent years, systems that monitor and control home environments, based on non-vocal and non-manual interfaces, have been introduced to improve the quality of life of people with mobility difficulties. In this work, we present the reconfigurable implementation and optimization of such a novel system that utilizes a recurrent neural network (RNN). As demonstrated in the real-world results, FPGAs have proved to be very efficient when implementing RNNs. In particular, our reconfigurable implementation is more than 150× faster than a high-end Intel Xeon CPU executing the reference inference tasks. Moreover, the proposed system achieves more than 300× the improvements, in terms of energy efficiency, when compared with the server CPU, while, in terms of the reported achieved GFLOPS/W, it outperforms even a server-tailored GPU. An additional important contribution of the work discussed in this study is that the implementation and optimization process demonstrated can also act as a reference to anyone implementing the inference tasks of RNNs in reconfigurable hardware; this is further facilitated by the fact that our C++ code, which is tailored for a high-level-synthesis (HLS) tool, is distributed in open-source, and can easily be incorporated to existing HLS libraries.

show abstract

Section: Performance Over Power Efficiencymentioning

confidence: 96%

A Novel FPGA-Based Intent Recognition System Utilizing Deep Recurrent Neural Networks

2021

View full text Add to dashboard Cite

show abstract

“…The architecture of the lowest level in the proposed hierarchy is defined by the data structure deployed in the local memories mem0 in the MAP array, the instructions executed in each cell by eng0, and the functions performed in the logdepth networks REDUCE and SCAN (see [13]). Shortly, the architecture can be defined as follows:…”

Section: The Abstract Modelmentioning

confidence: 99%

A Recursive Hierarchy for Accelerator-Level Parallelism

Maliţa¹,

Ștefan²

2022

World Congress on Electrical Engineering and Computer Systems and Science

View full text Add to dashboard Cite

The emergence, under the pressure of the ASICs imposed by the corporate space, of the field of Accelerator-Level Parallelism (ALP) requires a theoretical analysis to avoid the slippages that have characterized the evolutions of the last decades in the field of parallel computing. Ad hoc solutions imposed under time-to-market pressure have distorted the evolution of the field of parallel computing. The opportunity offered by the ALP challenge must be used to make last minute corrections in the chaotic evolution of the development of the parallel computing domain. The solution we propose is an attempt to reconsider parallelism from a double perspective. A purely theoretical one based on a mathematical model, that of the partially recursive functions proposed by Stephan Kleene, and another that emerges under the pressure of the increasingly complex applications demanded by the IT market. Our proposal consists in the hierarchical recursive structuring of ALP starting from the abstract MapScanReduce model that we have already proposed for the parallel computing.

show abstract

Heterogeneous Computing System for Deep Learning

Cited by 2 publications

References 5 publications

A Novel FPGA-Based Intent Recognition System Utilizing Deep Recurrent Neural Networks

A Novel FPGA-Based Intent Recognition System Utilizing Deep Recurrent Neural Networks

A Recursive Hierarchy for Accelerator-Level Parallelism

Contact Info

Product

Resources

About